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Preface 


This  report  describes  a  new  target  acquisition  performance  model  which  uses  the 
Targeting  Task  Performance  (TTP)  metric.  Like  its  predecessor,  the  famous  Johnson 
criteria,  the  new  model  assumes  that  range  performance  is  proportional  to  image  quality. 
Simplicity  of  implementation  is  therefore  maintained.  However,  the  TTP  model  predicts 
image  quality  in  a  different  fashion.  In  addition  to  overall  better  accuracy,  the  TTP  metric 
can  be  used  to  model  sampled  imagers,  high  frequency  boost,  non-white  noise,  and  other 
features  of  modern  imagers  which  cannot  be  accurately  modeled  with  the  Johnson 
criteria. 

The  Johnson  criteria  are  used  almost  universally  to  predict  range  performance.  Johnson 
uses  the  resolving  power  of  an  imager  as  a  metric  of  sensor  “goodness”  for  target 
acquisition  purposes.  For  a  given  target  to  scene  contrast,  resolving  power  is  the  highest 
spatial  frequency  passed  by  the  sensor  and  display  and  visible  to  the  observer.  He 
multiplies  the  resolving  power  of  the  imager  (in  cycles  per  milliradian)  by  the  target  size 
(in  milliradians)  to  get  “cycles  on  target.”  Johnson  published  a  table  of  the  number  of 
cycles  on  target  needed  to  detect,  recognize,  identify,  or  perform  other  target  acquisition 
tasks;  these  are  his  “criteria”  for  target  acquisition.  The  basic  assumption  underlying  the 
Johnson  metric  is  that  all  electro-optical  imagers  are  the  same  in  some  broad  sense.  The 
performance  of  the  imager  can  be  determined  solely  by  the  highest  spatial  frequency  (/}) 
visible  at  the  average  target  to  background  contrast.  When  the  Johnson  method  works,  it 
is  not  because  fj  is  important  per  se,  but  rather  because  an  increase  in  fj  represents  an 
improvement  in  the  contrast  rendition  at  all  spatial  frequencies.  However,  with  sampled 
imagers,  fj  is  more  an  indicator  of  sample  rate  than  image  quality.  Also,  because  the 
Johnson  metric  is  based  on  the  system  response  at  a  single  frequency,  it  cannot  predict 
the  effect  of  tailoring  the  image  frequency  spectrum  through  digital  processing.  For 
example,  the  benefits  of  edge  sharpening  by  high  frequency  boost  cannot  be  predicted. 

In  Appendix  A  of  this  report,  the  predictions  of  the  Johnson  criteria  are  shown  to  be 
fundamentally  flawed  due  to  its  insensitivity  to  imager  characteristics  below  the  limiting 
frequency.  This  flaw  makes  predictions  for  many  modem  imaging  systems  inaccurate. 
Experimental  data  show  the  problems  with  the  Johnson  criteria  and  illustrate  the  robust 
performance  of  the  TTP  metric.  The  simplicity  of  implementing  a  range  performance 
model  with  the  Johnson  criteria  is  retained  by  the  new  metric  while  extending 
applicability  of  the  model  to  sampled  imagers  and  digital  image  enhancement. 

The  new  target  acquisition  model  includes  another  fundamental  change,  also  described  in 
this  report.  The  current  models  to  predict  Minimum  Resolvable  Temperature  and 
Minimum  Resolvable  Contrast  were  introduced  in  1995.  Products  like  the  NVTherm 
thermal  model  and  the  SSCAM  (Solid  State  Camera)  TV  model  differ  from  their 
predecessors  because  the  contrast  limitations  of  vision  are  incorporated  into  these  models. 
Incorporating  the  eye  contrast  limitations  allows  the  modeling  of  image  intensifiers,  TV, 
and  sensitive  thermal  imagers  which  were  previously  not  accurately  modeled.  However, 
the  1995  model  set  continued  the  use  of  the  “matched  eye  filter.”  Since  this  filter  does  not 
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reflect  psychophysical  reality,  those  models  are  only  accurate  when  the  noise  is  spectrally 
flat  (white)  compared  to  the  signal.  Digital  image  processing,  particularly  high  frequency 
boost  or  image  restoration,  can  lead  to  a  distinctly  non-white  noise  spectrum.  The 
modeling  of  modern  imagery  requires  a  change  in  the  eye  filters. 

Because  the  Johnson-based  models  have  been  widely  used  for  so  long,  considerable 
attention  is  paid  in  this  report  to  the  history  and  assumptions  underlying  the  older  model. 
In  addition,  the  new  TTP  model  is  described  in  detail.  Also,  the  theory  predicting  human 
contrast  threshold  when  using  an  EO  imager  is  thoroughly  documented. 
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Introduction 

In  Figure  1.1,  the  soldier  is  using  an  imaging  sensor,  hoping  to  quiekly  identify  whether 
the  tank  is  a  threat.  This  report  deseribes  a  model  whieh  predicts  the  probability  that  he 
correctly  identifies  the  target.  The  problem  is  tackled  in  two  parts.  First,  the  soldier’s 
quality  of  vision  when  using  the  sensor  and  display  is  quantified.  Most  of  the  report  is 
devoted  to  this  topic.  Second,  the  relationship  between  quality  of  vision  and  performing  a 
visual  task,  such  as  identifying  the  tank,  is  discussed. 


Figure  1.1  Soldier  Trying  to  Identify  a  Tank  as  Friend  or  Foe 


The  theory  in  this  report  is  couched  in  terms  of  the  observer  viewing  the  world  through 
the  imager.  The  imager  extends  the  observer’s  vision  because  it  provides  advantages  over 
human  eyesight.  The  target  can  be  magnified;  that  is,  the  angle  subtended  by  the  target  at 
the  eye  can  be  greatly  increased,  making  it  easier  to  see.  The  imager  also  lets  the  observer 
see  light  outside  the  visible  wavelengths,  often  a  great  advantage  because  the  target 
signatures  are  more  robust.  There  is  more  night  illumination  at  near  infrared  wavelengths 
than  the  visible,  for  example,  so  that  image  intensifiers  work  better  in  that  spectral  band. 
Another  example  is  thermal  imagery,  which  does  not  depend  at  all  on  natural 
illumination.  On  the  negative  side,  however,  the  imager  blurs  the  target  and  adds  noise  to 
the  viewed  scene. 

The  degradation  due  to  imager  noise  and  blur  are  in  addition  to  the  natural  limitations  of 
human  eyesight.  If  the  imager  were  perfect — no  blur  from  the  optics,  detector,  or  display 
and  no  noise  from  the  photo-detection  process — the  observer’s  range  performance  would 
still  be  limited  by  his  vision.  Image  quality  results  from  the  inherent  limitations  of  human 
vision  in  combination  with  imager  blur  and  noise.  The  limitations  of  human  vision 
depend,  in  turn,  upon  the  display  luminance  and  contrast. 

The  most  widely  used  measures  of  image  quality  are  visual  acuity  and  resolving  power. 
Visual  acuity  has  the  connotation  that  high  contrast  (black  on  white)  letters  or  symbols 
are  used  to  check  vision.  The  observer  who  reads  the  smallest  letters  has  the  best  visual 
acuity.  With  sensors,  the  term  resolving  power  has  the  same  connotation.  Bar  patterns  are 
generally  used  for  imagers;  the  best  imager  displays  the  smallest  bar  pattern.  Although 
commonly  used  and  easy  to  test,  these  high-contrast  measures  do  not  adequately  quantify 
how  well  a  person  can  see  with  the  naked  eye  or  through  the  imager. 
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A  scene  eonsists  of  many  luminanee  levels.  The  eye  achieves  an  integrated  view  of 
objeets  by  eonneeting  lines  and  surfaces.  These  lines  and  surfaees  do  not  share  a 
partieular  brightness  throughout  their  extent.  For  example,  the  baekground  immediately 
behind  a  target  might  not  be  uniform,  and  yet  the  eye  sees  a  full  or  partial  silhouette. 
Perspeetive  is  gained  from  eonverging  lines  whieh  might  vary  in  both  sharpness  and 
luminanee  with  inereasing  range.  Slight  ehanges  in  hue  or  texture  ean  provide  an 
exeellent  cue  as  to  the  distance  and  orientation  of  an  objeet  and  possibly  indieate  the 
nature  of  the  surfaee  eharaeteristies.  Acute  vision  requires  the  ability  to  diseriminate 
small  differenees  in  gray  shade,  not  just  the  ability  to  diseriminate  small  details  whieh 
happen  to  have  good  eontrast. 

In  Figure  1.2,  the  pieture  of  Goldhill  has  an  average  modulation  eontrast  of  0.22.  The  3- 
bar  eharts  to  the  right  have  eontrasts  of  0.04,  0.08,  0.16,  and  0.32  with  average  luminanee 
equal  to  the  average  of  the  pieture.  When  noise  is  added  and  the  pieture  blurred,  as  shown 
at  the  bottom  of  Figure  1.2,  high  eontrast  details  are  still  visible,  but  low  eontrast  details 
disappear.  This  is  illustrated  by  the  bar  charts  at  the  bottom  whieh  were  degraded  in  the 
same  way  as  the  Goldhill  picture.  A  quantifieation  of  visual  performanee  requires  that 
resolution  be  measured  for  all  shades  of  gray  in  the  image.  The  means  of  aehieving  this 
quantifieation  is  deseribed  later. 

The  model  proposed  here  is  more  eomplex  than  resolving  power;  hopefully  the  need  for 
this  added  complexity  will  become  apparent  as  the  report  proeeeds.  For  the  present,  we 
quote  Lueien  Biberman  who  is  quoting  G.C.  Brook  (Chapter  8  in  Biberman,  1973;  Brook, 


Figure  1.2.  Picture  of  Goldhill  and  3-bar  Charts  of  Various 
Contrasts  Measuring  resolution  for  the  average  or  peak  contrast 
does  not  adequately  quantify  picture  quality. 


1965). 

“Before  we  ean  make  progress  in  the  use  of  our  new  teohniques  it  will  be 
necessary  to  bypass  two  obstacles,  the  first  of  whieh  is  the  existence  and 
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firm  establishment  of  resolving  power,  and  the  seeond  is  the  belief  that 
seienee  will  give  us  one  number  quality  index  that  will  supplant  all 
previous  evaluation  teehniques. 

“Resolving  power  has  been  in  use  for  so  long  that  it  has  eome  to  be 
thought  of  as  something  fundamental  which  determines  other  aspects  of 
image  quality  and  has  some  very  special  significance.  Whenever  a  new 
criterion  of  image  quality  is  proposed,  we  at  once  ask  ‘How  does  it  relate 
to  resolving  power?’  instead  of  considering  it  in  more  general  terms.  And 
because  resolving  power  is  used  for  so  many  different  purposes,  and  gives 
a  one  number  answer,  it  is  assumed  that  any  new  technique  must  be 
inferior  if  it  does  not  do  the  same.  As  we  have  already  seen,  resolving 
power  can  serve  many  purposes  because  it  does  not  serve  any  of  them 
well.” 

The  imager  model  must  account  for  both  hardware  characteristics  and  human  vision.  In 
EO  imagers,  blur,  noise,  and  contrast  all  limit  our  ability  to  see  details.  Further,  unless  the 
display  is  big  and  bright,  the  physiological  limitations  of  the  eye  cannot  be  ignored.  A 
picture  might  appear  grainy  when  presented  at  high  display  luminance  and  not  noisy  at  all 
when  presented  at  low  display  luminance.  This  does  not  mean  that  the  picture  is  better  in 
some  quantitative  sense  when  presented  at  low  display  luminance;  our  inability  to  see  the 
noise  infers  an  equivalent  inability  to  see  contrast  gradations  within  the  scene  itself. 
Hardware  characteristics  do  not,  by  themselves,  establish  image  quality.  Rather,  hardware 
characteristics  interact  with  human  vision  to  establish  how  well  the  scene  is  perceived 
through  the  imager. 

Depending  on  scene  conditions  and  sensor  control  settings,  the  dominant  hardware  factor 
limiting  performance  can  be  blur,  noise,  or  contrast.  Blur  results  primarily  from  factors 
like  diffraction  or  aberrations  in  the  objective  lens  and  summing  of  the  incoming  light 
over  the  detector  instantaneous  field  of  view.  Summing  the  light  from  different  points  in 
the  scene  results  in  the  blurring  of  scene  detail.  Noise  is  generally  associated  with  the 
photo-detection  process.  In  the  theoretical  limit,  signal  is  proportional  to  the  number  of 
photo-electrons  generated  in  the  detector.  Noise  is  proportional  to  the  square  root  of  the 
number  of  photo-electrons.  Contrast  can  be  degraded  by  the  atmosphere.  For  example, 
sunlight  scattered  by  the  atmosphere  into  the  sensor  line-of-sight  can  seriously  degrade 
contrast.  Contrast  can  also  be  degraded  by  the  glare  of  ambient  light  reflecting  off  the 
display  or  by  improper  display  settings.  Blur,  noise,  and  contrast  limit  our  ability  to  see 
detail  and  therefore  limit  our  ability  to  identify  targets  or  to  discriminate  between  target 
and  background. 

Figure  1.3a  is  a  thermal  image  of  a  tank.  The  tank  has  been  exercised,  and  the  road 
wheels  and  engine  are  hot,  giving  the  tank  a  thermal  signature  which  is  distinct  from  the 
background.  In  Figures  1.3b  and  1.3c,  the  tank  is  viewed  from  progressively  greater 
distance.  Optical  magnification  is  used  so  that  the  tank  appears  to  be  the  same  size,  but 
diffraction  in  the  objective  lens  has  blurred  the  tank’s  details.  Noise  is  not  visible  in  the 
image;  the  tank  is  difficult  to  identify  at  the  longest  range  because  of  the  blur.  In  Figure 
1. 3d,  the  tank  has  cooled  off.  In  order  to  see  the  tank,  the  gain  on  the  imager  is  increased. 
Increasing  imager  gain  in  Figure  I.3e  makes  the  tank  visible,  but  also  makes  the  detector 
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noise  visible.  In  Figure  1.3f,  noise  assoeiated  with  photo-deteetion  obseures  the  tank 
image. 

Blur  and  noise  also  affect  the  performance  of  reflected  light  sensors.  Generally, 
performance  is  limited  by  blur  or  contrast  under  good  illumination  conditions  and  by 
noise  under  poor  illumination.  This  is  because,  in  the  theoretical  limit,  signal  to  noise  is 
proportional  to  the  square  root  of  photo-current.  As  illumination  decreases,  photo-current 


Figure  1.3  Thermal  Image  of  a  Tank  Showing  Effects  of  Blur  and  Noise 
At  top,  pristine  image  in  (a)  is  blurred  by  imager  in  (b)  and  further  blurred  in  (c).  At 
bottom,  pristine  but  low-contrast  image  in  (d)  becomes  noisy  images  in  (e)  and  (f) 
because  increased  gain  amplifies  detector  noise. 

decreases,  and  noise  becomes  more  dominant. 

Figure  1.4a  shows  a  visible  band  image  of  a  tank.  In  Figures  1. 4b  and  1. 4c,  the  tank  is 
viewed  from  progressively  greater  distance.  Optical  magnification  is  used  so  that  the  tank 
appears  to  be  the  same  size,  but  diffraction  in  the  objective  lens  has  blurred  the  tank’s 
details.  Noise  is  not  visible  in  the  image;  the  tank  is  difficult  to  identify  at  the  longest 
range  because  of  the  blur.  In  Figure  I.4d,  illumination  has  decreased  and  the  tank  is  not 
visible.  In  image  1 .4e,  the  camera  gain  is  increased  and  the  tank  is  again  visible,  but  the 
low  illumination  makes  the  picture  noisy.  In  Figure  I.4f,  illumination  has  decreased  to 
the  point  that  the  tank  is  not  visible  in  the  noise. 

A  third  factor  important  in  determining  performance  of  night  vision  sensors  is  display 
contrast,  especially  when  display  luminance  is  less  than  pho topic.  In  Figure  1.5,  the 
picture  of  Lena  becomes  clearer  from  left  to  right  because  contrast  increases;  neither 
signal  to  noise  nor  blur  changes.  Contrast  limitations  are  especially  important  when 
display  luminance  is  low;  the  eye’s  ability  to  discriminate  gray  levels  in  an  image 
degrades  as  display  luminance  decreases. 
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Figure  1.4  Visible  Image  of  a  Tank  Showing  Effects  of  Blur  and  Noise 


Figure  1.5  Picture  of  Lena  Showing  Contrast  Increasing  from  Left  to  Right 


Low  display  luminance  might  occur  because  of  imager  limitations.  For  example,  due  to 
insufficient  light  gain,  early  image  intensifiers  provided  less  than  0.01  foot  Lamberts  (fL) 
eyepiece  luminance  under  starlight  scene  illumination.  (10  fL  is  considered  low  photopic 
luminance.)  Early  attempts  to  model  image  intensifier  performance  failed  because  the 
model  was  based  on  signal  to  noise  in  the  image.  However,  at  the  low  display  luminance, 
neither  the  signal  nor  the  noise  was  clearly  visible  to  the  observer.  Image  intensifiers 
were  not  accurately  modeled  until  the  contrast  limitations  of  the  eye  were  incorporated 
into  the  model. 

Low  display  luminance  is  not  uncommon.  Display  luminance  might  be  low  because  the 
operator  chooses  to  maintain  dark  adaptation.  During  night  flight,  for  example,  military 
pilots  flying  without  goggles  set  instrumentation  displays  to  between  0.1  and  0.3  fL;  this 
permits  reasonable  viewing  of  the  instruments  while  maintaining  dark  adaptation  in  order 
to  see  outside  the  aircraft.  Regardless  of  the  reason  for  a  non-optimized  display,  the  result 
is  degraded  human  performance  when  using  an  imager.  It  is  common  and  even  typical  for 
the  display  luminance  of  a  night  vision  device  to  be  less  than  a  foot  Lambert,  and  this  low 
display  luminance  is  an  important  factor  in  determining  the  performance  of  the  night 
vision  imager. 

All  four  factors  affecting  performance — the  blur,  noise,  and  contrast  of  the  imager  as  well 
as  the  physiological  limitations  of  the  eye  in  adjusting  to  a  non-optimized  display — must 
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be  handled  by  the  model.  All  four  factors  affect  the  targeting  performance  expected  from 
the  imager.  For  both  reflective  and  thermal  imagery,  performance  is  generally  limited 
simultaneously  by  a  combination  of  factors.  That  is,  the  image  is  not  less  blurred  just 
because  noise  is  present. 

The  theory  in  this  report  covers  all  types  of  EO  imagers.  Imagers  of  reflected  light  like 
sunlight  or  starlight  operate  in  the  spectral  band  between  0.4  and  2  microns.  Thermal 
imagers  sense  emitted  light  (that  is,  heat).  Thermal  imagers  operate  in  the  mid-wave 
infrared  (3  to  5  microns)  or  the  long-wave  infrared  (8  to  12  microns).  These  spectral 
bands  are  defined  by  atmospheric  “windows”  with  good  transmission.  The  units  used  to 
describe  signal  and  noise  for  thermal  imagers  are  different  than  the  units  used  when 
modeling  reflected  light  sensors.  However,  aside  from  the  details  of  calculating  signal 
and  noise,  the  basic  target  acquisition  theory  is  exactly  the  same.  In  both  cases,  the 
observer  is  looking  at  a  display  of  the  blurred  and  noisy  image  of  a  target.  The  model 
predicts  the  effect  of  blur,  noise,  and  display  characteristics  on  target  acquisition  task 
performance. 

That  is  not  to  say  that  interpreting  thermal  imagery  is  as  easy  as  understanding  a  picture 
in  the  visible  spectral  band.  For  most  people,  imagery  becomes  progressively  harder  to 
interpret  as  the  wavelength  increase  from  visible  to  near  infrared  to  short-wave  infrared. 
Thermal  imagery,  which  is  emissive  rather  than  reflective,  is  very  difficult  to  interpret  for 
the  untrained  observer.  However,  the  difficulty  of  the  observer’s  task  is  included  in  the 
target  acquisition  model,  not  in  the  image  quality  model.  The  same  image  model  is  used 
for  all  imagers. 

Traditionally,  thermal  scenes  are  characterized  with  absolute,  blackbody  temperature 
differences,  and  thermal  imager  frequency  response  is  measured  with  4-bar  patterns. 
Illuminated  scenes  are  characterized  by  contrast,  and  the  frequency  response  of  reflected 
light  imagers  is  characterized  with  3-bar  patterns.  An  absolute  temperature  difference  in 
the  scene  can,  of  course,  be  converted  to  a  contrast,  just  as  a  contrast  can  be  algebraically 
converted  to  an  absolute  illumination  difference.  The  main  difference  in  the  historical 
treatment  of  thermal  and  reflected  light  imagers  is  that  the  two  are  characterized  using 
different  bar  patterns. 

In  this  report,  all  imagers  are  treated  the  same.  The  choice  between  absolute  differences 
in  the  scene  or  scene  contrast  is  arbitrary.  However,  as  discussed  in  Part  III,  contrast  is 
normally  used  to  characterize  the  eye.  The  use  of  contrast  when  modeling  EO  imagers 
simplifies  the  presentation  of  the  theory.  Further,  it  is  customary  to  use  sinewave  patterns 
for  eyeball  measurements,  and  the  use  of  sinewaves  is  consistent  with  our  sensor  model. 
Fourier  theory  is  used  to  model  system  blur  and  noise.  The  development  of  a  target 
acquisition  metric  is  made  easier  by  characterizing  human  vision  with  sinewaves;  this 
allows  easy  integration  of  the  eye  behavior  into  the  Fourier  frequency  domain  model. 

It  is  understood  that  sinewave  measurements  are  not  practical  in  the  laboratory.  However, 
there  is  a  known  relationship  between  bar  chart  response  (either  3-  or  4-bar)  and 
sinewave  response.  These  conversions  are  described  where  appropriate  in  the  theory 
sections  on  specific  imaging  technologies. 
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Most  of  the  report  is  devoted  to  predicting  how  well  the  observer  can  see  through  the 
imager.  Our  ultimate  goal,  however,  is  to  predict  how  well  the  observer  can  detect, 
recognize,  or  identify  targets.  To  meet  that  goal,  an  image  quality  metric  is  needed  as  a 
link  between  quality  of  vision  and  task  performance. 

The  Johnson  criteria  are  used  almost  universally  to  predict  range  performance  based  on 
sensor  resolution.  Johnson  proposed  that  an  imager’s  utility  for  target  acquisition 
purposes  was  proportional  to  its  resolving  power  (Johnson,  1958).  That  is,  for  a  given 
target  to  scene  contrast,  the  highest  spatial  frequency  passed  by  the  sensor  and  display 
and  visible  to  the  observer  determines  the  probability  that  an  observer  can  correctly 
identify  a  tactical  vehicle  or  perform  other  visual  discrimination  tasks.  He  used  his 
limiting-resolution  metric  to  establish  criteria  for  target  acquisition  tasks. 

Johnson  performed  some  engineering  experiments  using  image  intensifiers  to 
simultaneously  view  bar-charts  and  scale  models  of  tactical  vehicles.  He  published  a 
table  giving  the  required  “cycles  on  target”  for  a  0.5  probability  of  detecting,  recognizing, 
identifying,  and  other  levels  of  target  discrimination.  “Cycles  on  target”  is  the  imager’s 
bar  resolution  in  cycles  per  milliradian  multiplied  by  the  angular  subtense  of  the  target  in 
milliradians.  Johnson  used  the  target’s  critical  dimension  to  determine  its  angular  size; 
critical  dimension  corresponds,  more  or  less,  to  the  minimum  of  the  target’s  height  or 
width  as  viewed  by  the  sensor.  D’Agostino  later  substituted  the  square  root  of  viewed 
area  for  critical  dimension,  and  updated  the  cycle  criteria  needed  for  target 
discriminations  (Howe,  1993). 

The  Johnson  metric  uses  limiting  bar-chart  resolution  as  an  indicator  of  sensor  goodness 
for  target  acquisition  purposes.  Predictive  accuracy  of  this  metric  is  best  when  comparing 
“like”  sensors  and  conditions.  The  metric  is  not  compatible  with  many  features  found  in 
modem  sensors.  For  example,  it  is  not  compatible  with  sampled  imagers.  Further,  the 
Johnson  metric  fails  to  predict  the  impact  of  frequency  boost  on  range  performance. 

The  basic  assumption  underlying  the  Johnson  methodology  is  that  all  electro-optical 
imagers  are  the  same  in  some  broad  sense.  The  performance  of  the  imager  can  be 
determined  solely  by  the  limiting  resolution  frequency  (/j)  visible  at  the  average  target  to 
background  contrast.  When  the  Johnson  criteria  work,  it  is  not  because  //is  important  per 
se,  but  rather  because  an  increase  in  fj  represents  an  improvement  in  the  contrast 
rendition  at  all  spatial  frequencies.  However,  with  sampled  imagers,  fj  is  more  an 
indicator  of  sample  rate  than  image  quality.  Further,  as  pointed  out  persistently  by  Fred 
Rosell,  the  Johnson  metric  fails  to  accurately  predict  the  effect  of  noise  on  task 
performance.  The  observer  appears  to  require  more  sensor  resolution  when  the  resolution 
is  noise  limited  as  opposed  to  spatial  frequency  response  limited  (Rosell,  1979  &  2000). 

The  desired  approach  to  modeling  sampled  imagers  is  to  incorporate  a  targeting  metric 
that  does  not  have  the  problems  associated  with  the  Johnson  metric.  Work  on  a 
replacement  metric  started  several  years  ago  (Vollmerhausen,  07/2000  and  Driggers, 
2000).  This  report  describes  how  the  new  TTP  (Targeting  Task  Performance)  metric  is 
calculated  and  used.  The  logic  of  this  metric  is  discussed  by  Barten;  TTP  is  similar  to 
Barten’s  SQRI  (Square  Root  Integral),  except  that  linear  rather  than  logarithmic 
integration  is  used  (Barten,  1999).  It  is  also  similar  to  van  Meeteren’s  Integrated  Contrast 
Sensitivity  (Task,  1976  and  Tannas,  1985).  A  variety  of  experiments  were  performed 
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showing  the  problem  with  the  Johnson  eriteria  and  illustrating  the  robust  behavior  of  the 
new  TTP  metric  (Vollmerhausen,  2003  and  Appendix  A). 

The  organization  of  this  report  is  outlined  as  follows.  The  next  two  sections  provide 
needed  background.  Part  2  is  on  model  history;  this  section  discusses  the  assumptions 
upon  which  models  of  the  last  half-century  were  based.  Part  3  discusses  some  of  the 
remarkable  properties  of  human  vision  and  describes  how  vision  is  characterized  in  our 
model.  Part  4  describes  how  the  hardware  characteristics  of  the  sensor  and  display  are 
combined  with  the  limitations  of  the  eye  to  form  a  model  of  threshold  vision  through  an 
imager.  The  target  acquisition  tasks  predicted  by  the  model  are  defined  in  Part  5.  Part  6 
describes  how  an  image  quality  metric  is  used  to  relate  the  quality  of  threshold  vision 
through  an  imager  to  the  probability  of  acquiring  a  target  at  range.  Part  7  discusses  how 
sampled-image  artifacts  affect  performance.  Parts  8  and  9  present  details  on  the  models 
for  reflected-light  and  thermal  imagers,  respectively. 

Appendix  A  summarizes  ID  experiments  which  show  the  problems  with  the  Johnson 
criteria  and  the  robust  performance  of  the  TTP  metric.  Experiments  included  both 
thermal  and  visible  imagery.  Further,  experimentation  was  done  with  well  sampled 
images  and  a  variety  of  MTF  and  noise,  poorly  sampled  imagers,  and  imagers  with  high 
frequency  boost  and  colored  (spectrally  non-uniform)  noise.  Appendix  B  discusses 
experiments  run  with  very  low  contrast  targets.  Appendix  C  describes  a  recognition 
experiment.  This  experiment  was  done  for  two  reasons.  First,  to  check  the  TTP  metric 
with  a  task  difficulty  easier  than  target  identification.  Second,  the  experiment  checked 
that  the  sampling  range  adjustment  is  correct  for  target  recognition.  Appendix  D 
describes  an  ID  experiment  where  the  images  were  corrupted  by  laser  speckle.  The 
experiment  is  significant  because  laser  speckle  has  a  very  non-uniform  power  spectrum; 
the  imagery  is  highly  corrupted  with  low  frequency,  very  high  contrast  noise.  Appendix  E 
provides  some  details  needed  to  implement  the  target  acquisition  model. 
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History  of  Target  Acquisition  Modeling 

Electro-optics  (EO)  technology  has  flourished  since  World  War  II;  even  a  brief  mention 
of  all  the  important  contributors  and  events  would  require  volumes.  The  present 
discussion  is  focused  on  human-in- the-loop  target  acquisition  models,  and  only  the  major 
threads  of  model  development  history  are  followed. 

The  history  of  modeling  EO  imagers  traces  back  almost  60  years  to  the  pioneering  work 
of  Otto  Schade.  In  the  introduction  of  his  four-part  paper  “Electro-Optical  Characteristics 
of  Television  Systems,”  Schade  noted  that  the  standard  which  must  be  met  by  an  EO 
image  was  established  by  the  capabilities  and  optical  characteristics  of  the  eye  (Schade, 
1948).  He  pointed  out,  for  example,  that  the  visibility  of  “grain”  fluctuations  decreases 
with  brightness,  so  a  comparison  of  the  signal  to  noise  characteristics  of  different  imaging 
technologies  should  be  made  at  the  same  display  brightness  level.  The  following  is  a 
quote  from  the  conclusion  of  Part  IV. 

“The  quality  of  television  and  photographic  images  depends  in  a  large 
measure  on  three  basic  characteristics  of  the  imaging  process:  the  ratio  of 
signals  to  random  fluctuations,  the  transfer  characteristic,  and  the  detail 
contrast  response.  These  characteristics  are  measured  and  determined  by 
objective  methods  which  apply  equally  well  to  all  components  of 
photographic  and  electro-optical  imaging  systems.”  He  states  that 
hardware  can  be  rated  on  an  objective  numerical  basis,  and  then  continues: 

“An  interpretation  of  the  numerical  values  obtained  by  calculation  or 
measurement  of  the  three  characteristics  that  determine  image  quality 
requires  correlation  with  the  corresponding  subjective  impressions: 
graininess,  tone  scale,  and  sharpness.  This  correlation  has  been  established 
by  analyzing  the  characteristics  of  vision  and  by  including  these 
characteristics  in  an  evaluation  of  the  over-all  process  of  seeing  through  an 
image-reproducing  system.” 

In  1956,  Schade  published  a  model  of  the  eye  (Schade,  1956).  The  visual  system  was 
treated  as  an  analog  camera;  performance  of  the  camera  was  quantified  using  sinewave 
response,  contrast  sensitivity,  and  other  psychophysical  data.  Schade  combined  the 
physical  data  on  hardware  and  psychophysical  data  on  human  vision  and  created  a 
holistic  model  of  the  observer’s  aided  vision.  Schade  postulated  that,  for  each  retinal 
illumination,  information  transfer  could  be  calculated  by  a  knowledge  of  threshold  signal 
to  noise  ratio  and  signal  transfer  characteristics.  Over-all  transfer  characteristics  were 
obtained  by  integration  of  intensity  steps  and  by  considering  the  sampling  efficacy  of  the 
rods  and  cones;  this  integration  of  “statistical  units”  constituted  his  passband  metric.  He 
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used  this  model  to  eompute  the  degradation  in  visual  performanee  when  the  imager  was 
inserted  between  the  seene  and  eye. 

“One  of  the  objeets  for  eonstrueting  an  analog  [of  the  eye]  is  its  use  for 
obtaining  visual  evaluations  for  image  eharaeteristies  by  ealeulation,  to 
eliminate  subjeetive  observations.  This  ealeulation  is  done  by  eomputing 
the  degradation  in  visual  response  when  an  external  proeess  is  inserted 
between  the  objeet  and  the  eye.  The  degradation  in  resolution,  for 
example,  is  given  by  the  ratio  of  two  line  numbers  obtained  at  a  given 
small  response  factor;  one  with  the  eye  alone,  and  the  other  for  the  eye  in 
cascade  with  the  external  imaging  process.  The  total  degradation  may  be 
rated  by  the  logarithm  of  the  ratio  of  the  equivalent  passbands  [the  normal 
visual  passband  and  the  combination  eye-imager  passband].” 

Schade’s  work  provided  fundamental  and  widely  accepted  design  guidelines  for 
television  and  other  EO  systems.  However,  Schade’s  sensor  performance  model  was 
complex  and  difficult  to  adapt  to  changing  conditions.  Although  his  model  was  widely 
studied,  it  was  not  widely  used.  To  our  knowledge,  the  ability  of  Shade’s  analog  eye 
model  to  predict  target  acquisition  performance  was  never  assessed.  However,  based  on 
the  form  of  the  passband  model,  our  experiments  indicate  that  it  would  not  be  a  good 
predictor  of  target  acquisition  performance.  Shade  later  simplified  his  passband  metric  to 
include  only  integration  over  the  sensor  MTF  (Shade,  1973).  The  simplified  version  of 
the  passband  metric  was  evaluated  by  Task  and  did  not  accurately  predict  target 
acquisition  performance  (Task,  1976;  Tannas,  1985). 

Meanwhile,  the  model  that  was  eventually  used  by  virtually  everyone  for  the  next  fifty 
years  was  being  developed  by  Coltman  (1954).  Coltman  developed  a  model  to  predict  the 
resolving  power  of  fluoroscopes.  Richards  adopted  Coltman’ s  model  to  predict  the 
resolving  power  of  night  vision  imagers  (Richards,  1967).  Johnson  postulated  that  target 
acquisition  performance  using  an  imaging  sensor  was  proportional  to  the  resolving  power 
of  the  imager  (1958).  A  modified  version  of  the  Coltman/Richards  model  for  the  imager 
and  the  Johnson  model  for  predicting  target  acquisition  range  were  brought  together  by 
Ratches,  Lawson,  and  others  in  the  Night  Vision  Laboratory  Static  Performance  Model 
(Ratches  1975,  1976,  and  2001).  The  NVL  model  used  Fourier  transform  theory  and 
communications  theory  concepts  which  were  fully  developed  for  imaging  sensors  by 
Lawson  (1971). 

Derivatives  of  Coltman’ s  model  are  so  widespread  that  the  model  is  generally  presented 
without  attribution.  The  simple  assumptions  which  are  the  basis  for  the  model  are  seldom 
questioned.  This  is  unfortunate.  Coltman’ s  focus  was  fluoroscopy,  and  his  model  requires 
that  the  display  be  optimized.  He  reasoned  as  follows. 

“The  advent  of  electronic  devices  for  brightening  images  has  made  it 
possible  in  principle  to  remove  the  optical  and  physiological  deficiencies 
of  the  eye.  In  the  limit  there  will  remain  only  the  quantum  noise  inherent 
in  the  signal  itself.” 

Coltman  based  his  model  on  ideas  put  forward  by  Barnes  and  Czerny  (1933),  de  Vries 
(1943),  and  fully  developed  by  Rose  (1948).  Rose  assumed  that  the  absorption  of 
luminous  flux  by  photoreceptors  of  the  retina  would  be  accompanied  by  the  same 
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statistical  fluctuations  (shot  noise)  as  occurs  in  any  square-law  deteetor.  He  eonsidered 
only  low  light  level  eireumstanees  where  quantal  fluetuations  eould  be  expeeted  to 
dominate  the  deteetion  proeess.  Further,  he  eonsidered  eireular  disks  of  suffieient  size  to 
be  resolved  by  the  eye.  Under  these  eireumstanees,  Rose  assumed  that  the  eye  would 
integrate  the  signal  and  noise  over  the  disk  area.  The  result  predicts  Piper’s  law;  for  a 
given  adapting  luminanee,  the  product  of  signal  threshold  and  the  angular  size  of  the  disk 
is  a  eonstant.  When  eompared  to  experimental  data,  Rose’s  theory  worked  for 
intermediate  sized  disks  but  failed  for  both  small  and  large  disks.  The  deteetion  of  small, 
eireular  disks  is  predieted  by  Rieeo’s  law;  for  small  objeets,  the  produet  of  signal 
threshold  and  disk  area  is  a  eonstant.  For  large  objeets,  deteetion  oeeurs  at  a  eonstant 
eontrast. 

Coltman  postulated  that  shot  noise  in  the  eye  would  not  be  signifieant  eompared  to  the 
photo-deteetion  noise  assoeiated  with  the  fluoroseope.  He  assumed  a  big,  bright  display 
and  that  noise  in  the  sensor  photo-deteetion  proeess  would  dominate  the  pereeptual  signal 
to  noise  because  of  the  gain  provided  by  the  display.  Realizing  that  bar-pattern  deteetion 
might  be  mediated  by  different  pereeptual  proeesses  than  eireular  disk  detection,  Coltman 
assumed  that  the  visual  system  acted  as  a  spatial  integrator  over  an  area  related  to  the 
objeet  to  be  deteeted  and  admitting  noise  from  the  same  area.  He  did  not  assume  signal 
and  noise  integration  over  a  single  bar.  Finally,  he  followed  Rose’s  assumption  that,  for  a 
deteetion  to  oeeur,  a  eonstant  signal-to-noise  ratio  threshold  must  be  achieved  at  some 
point  in  the  visual  proeessing  ehain.  In  Figure  2.1,  the  eye  is  summing  both  signal  and 
noise  over  an  area  related  to  the  bar  size.  Once  the  integrated  signal  exeeeds  the  noise  by 
a  fixed  threshold,  the  observer  ean  differentiate  between  the  bar  and  the  spaee. 


Figure  2.1  The  Eye  as  a  Spatial  Filter 
Coltman  sensor  model  assumed  that  the  eye  integrates  signal  and 
noise  over  some  area  of  the  image  related  to  the  bar  size. 

Coltman  tested  his  theory  experimentally.  Observer  variability  was  too  great  to 
eonelusively  demonstrate  the  validity  of  his  assumptions,  but  neither  did  the  data  indieate 
that  his  assumptions  were  in  error.  Most  analysts  aecepted  the  tenets  put  forward  in  the 
Coltman  model.  That  is,  the  signal  eontrast  needed  to  detect  a  bar  pattern  varied  in 
proportion  to  the  square  root  of  bar  area.  In  the  presence  of  noise,  large  bars  were  easier 
to  see  than  small  bars. 
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Coltman  did  not  postulate  that  the  eye  was  integrating  over  a  single  bar  of  the  pattern;  he 
eould  not  determine  the  aetual  shape  or  size  of  the  area  being  integrated.  In  Coltman’s 
experiment,  the  eye  eould  be  using  any  fraction  or  multiple  of  the  bar  or  bars  to  establish 
signal  to  noise.  The  signal  to  noise  ratio  threshold  (SNRT)  required  by  the  eye  to  see  the 
bar  is  an  experimental  result.  Increasing  integrated  area  by  a  factor  of  four  reduces  SNRT 
by  a  factor  of  two.  Since  SNRT  is  not  known  independent  of  the  experiment,  the 
integration  area  cannot  be  predicted.  By  the  same  logic,  for  white  noise,  the  shape  or 
spatial  weighting  of  the  integrated  area  cannot  be  predicted.  The  nature  of  the  spatial 
filter  was  not  established  by  Coltman’s  experiment. 

Richards  (1967)  adopted  Coltman’s  theory  to  model  night  vision  devices.  He  simplified 
Coltman’s  equation,  and  made  it  appear  more  definitive,  by  assuming  that  the  eye 
integrated  over  the  area  of  a  single  bar.  Coltman  explicitly  included  an  arbitrary 
multiplier  that  flagged  the  ambiguous  nature  of  his  results.  In  simplifying  Coltman’s 
equation,  Richards  set  the  arbitrary  multiplier  equal  to  one.  This  theory,  that  the  eye  is 
integrating  over  the  bar  area,  later  became  known  as  the  “matched  filter”  model. 

Experiments  like  those  of  Coltman  and  later  Rosell  (Rosell,  1973  and  Rosell,  1979) 
demonstrate  that  the  eye  filters  noise,  but  do  not  definitively  establish  a  filter  function. 
The  calibration  constant  (SNRT)  adapts  the  model  to  any  shape  and  placement  of  the 
filters  in  the  frequency  domain,  providing  that  bandwidth  is  proportional  to  bar  frequency 
and  that  the  noise  is  white. 

The  matched  filter  model  was  simple  and  seemed  to  explain  observed  behavior.  The 
eye’s  remarkable  ability  to  see  objects  in  noise  has  been  experienced  by  many  engineers 
over  the  years;  this  lends  credence  to  the  idea  that  the  eye  is  spatially  integrating  over  the 
object  being  viewed.  This  model  became  the  basis  of  the  NVL  (later  the  Night  Vision  and 
Electronic-Sensors  Directorate  or  NVESD)  performance  models  until  1995.  In  the  1975 
to  1995  model,  the  eye  acted  as  a  matched  filter,  integrating  signal  over  the  bar  area,  and 
admitting  noise  from  the  same  area.  The  bar  was  detected  (threshold  reached)  when  the 
peak  signal  to  RMS  noise  ratio  exceeded  a  fixed  value  (SNRT)  independent  of  bar  size. 
The  noise  arose  solely  from  the  detector;  as  detector  approached  zero,  so  did  predicted 
threshold. 

The  NVL  model  did  include  a  pupil-dependent  eye  MTE  factor  that  was  added  to  account 
for  vision  limitations;  the  model  also  included  a  factor  representing  temporal  signal 
integration  by  the  eye  that  depended  on  display  luminance.  These  factors  were  added  to 
overcome  the  assumption  of  an  “optimized”  display.  However,  pupil  dilation  plays  a 
minor  role  in  luminance  adaptation.  These  additions  did  not  change  the  fundamental 
nature  of  the  model;  the  contrast  threshold  limitations  of  the  eye  were  ignored. 

The  models  predicted  Minimum  Resolvable  Temperature  (MRT)  for  thermal  imagers  and 
Minimum  Resolvable  Contrast  (MRC)  for  imagers  of  reflected  light.  However,  these 
models  were  only  accurate  for  some  imagers.  Early  thermal  imagers,  for  example,  were 
noisy  and  had  sufficient  gain  that  the  noise  itself  could  generate  a  photopic  or  near 
pho topic  display  luminance.  These  sensors  met  the  assumptions  laid  down  by  Coltman; 
the  display  could  be  sufficiently  optimized  that  the  imager  was  detector  noise  limited. 
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Other  technologies,  however,  could  not  be  modeled.  Early  attempts  to  model  image 
intensifiers  failed,  because  the  eyepiece  luminance  of  the  device  was  low  mesopic.  The 
display  could  not  be  optimized,  and  eye  limitations  could  not  be  ignored.  Further,  the 
performance  of  day  sensors  could  not  be  modeled.  Daylight  illumination  provided  plenty 
of  signal  and  detector  noise  became  insignificant;  performance  was  contrast  limited. 
Since  the  early  NVL  models  were  strictly  based  on  a  signal  to  detector  noise  calculation, 
contrast  limited  situations  were  not  correctly  modeled. 

Alternative  assumptions  about  the  nature  of  the  eye  filter  received  some  attention. 
Sendall  and  Rosell  proposed  substituting  the  synchronous  integrator  model  (Sendall, 
1979;  Rosell,  2000).  However,  under  practical  conditions,  the  predictions  of  the  matched 
filter  model  and  the  synchronous  integrator  model  differ  only  slightly  (Lawson,  1979). 
Overington  (1976)  proposed  using  the  signal  and  noise  associated  with  the  boundary 
rather  than  the  area.  He  suggested  that  gradients  in  the  contour  are  important  and  should 
be  weighted  by  their  visibility.  The  static  predictions  in  British  Aerospace’s  Orcale 
Model  use  these  concepts,  but  the  details  of  the  Oracle  Model  implementation  have  not 
been  published  to  our  knowledge. 

It  was  recognized  by  a  number  of  researchers  that  the  Coltman  model  ignored 
fundamental  limitations  of  the  eye.  Schnitzler  (1973)  modeled  the  “noise-required  input 
contrast”  of  a  displayed  target  by  cascading  the  quantal  limitations  of  the  EO  imager  and 
eye.  Overington  paid  a  great  deal  of  attention  to  the  workings  of  the  eye,  emphasizing  the 
presence  of  noise  and  blur  both  external  and  within  the  eye.  He  proposed  an  equation  of 
vision  which  was  a  function  of  the  threshold  intensity  difference  divided  by  adapting 
luminance  (the  psychometric  contrast).  Object  detection  depended  upon  intensity 
gradients  in  the  displayed  image,  with  the  gradient  spacing  defined  by  eye  receptors  and 
gradient  amplitude  scaled  by  the  psychometric  contrast.  Overington  provides  alternate 
formulas  for  small,  intermediate,  and  large  objects  and  for  the  effect  of  the  blur 
associated  with  visual  aids.  However,  he  does  not  model  the  effect  of  system  related 
noise. 

Because  of  the  failure  of  the  “standard”  model  to  predict  image  intensifier  performance, 
model  development  continued  at  NVL  also.  The  work  was  probably  done  by  Kornfeld 
and  Lawson  using  ideas  put  forward  by  van  Meeteren  (1986),  but  the  available  working 
papers  are  not  signed  and  do  not  cite  references.  In  this  model,  “eye  noise”  is  assumed  to 
be  proportional  to  the  contrast  threshold  function  of  the  eye.  The  eye  noise  is  root-sum- 
squared  with  the  signal  to  noise  term  used  in  the  Ratches  model.  This  addition  to  the 
NVL  model  was  significant  because  the  modified  model  provided  correct  results  in  the 
limit  of  zero  detector  noise.  That  is,  as  the  system  noise  decreased  to  zero,  the  eye 
became  the  limiting  factor.  However,  this  addition  made  little  difference  in  model 
predictions;  image  intensifier  predictions  were  still  quite  inaccurate.  An  image  intensifier 
model  was  eventually  published  by  NVL;  the  IIV4  Image  Intensifier  model  was 
published  in  1995.  However,  that  model  used  empirical  fits  to  laboratory  data  in  an 
attempt  to  correct  the  problems  with  the  theory. 

One  concept  fundamental  to  all  of  the  above  theories  is  that  the  signal  was  detected  when 
it  exceeded  the  shot  noise  by  a  fixed  amount.  The  noise  was  sometimes  only  the  shot 
noise  associated  with  the  sensor  photo-detection,  and  sometimes  the  shot  noise  was 
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modeled  as  the  eombined  noise  from  sensor  and  eye  neural  noise.  The  signal  might 
represent  deteeting  a  bar  or  eireular  disk  against  a  bland  baekground;  in  this  ease,  the 
models  were  ealled  Minimum  Deteetable  Temperature  or  Minimum  Deteetable  Contrast. 
When  ealeulating  Minimum  Resolvable  Temperature  and  Minimum  Resolvable  Contrast, 
the  signal  was  the  bar-spaee-bar  modulation  of  a  bar  pattern.  Whether  the  model  was 
predieting  the  presenee  of  an  objeet  in  noise  or  deteetion  of  bar  modulation,  both  types  of 
model  employed  the  same  assumptions.  First,  the  visual  system  integrated  over  the  bar  or 
simple  objeet.  Seeond,  the  SNRT  was  eonstant  regardless  of  the  size  of  the  objeet  or  bar 
pattern.  Third,  the  eye  noise,  when  eonsidered,  was  assoeiated  with  primary  photo- 
deteetion  by  the  eye;  therefore,  the  eye  noise  was  proportional  to  the  square  root  of 
display  luminanee. 

The  history  above  has  foeused  on  the  deteetion  of  simple,  eireular  disks  or  bar-patterns 
through  and  imager.  An  equally  important  faetor  in  target  aequisition  is  relating  the 
deteetion  of  simple  patterns  to  the  proeess  of  interpreting  real,  eomplex  images.  Although 
the  Johnson  eriteria  is  used  almost  universally,  many  alternatives  have  been  proposed. 

Resell  used  the  matehed  filter  eoneept  to  ealeulate  sensor  resolution;  however,  he  felt  that 
the  Johnson  eriteria  range  predietions  were  impreeise  (Resell,  1979;  Resell,  2000).  The 
Johnson  metrie  tends  to  be  optimistie  when  the  image  is  noisy.  That  is,  more  “eyeles  on 
target”  are  needed  to  perform  an  aequisition  task  when  the  imagery  is  limited  by  noise 
rather  than  blur.  Resell  suggested  adjusting  the  Johnson  range  predictions  based  on  the 
signal  to  noise  established  by  target  contrast  at  range  and  the  sensor’s  noise  equivalent 
temperature  difference.  The  resulting  range  model  was  somewhat  clumsy  to  implement. 
The  validity  of  RoselTs  criticism  was  widely  understood,  however,  and  alternatives  to  the 
Johnson  model  were  pursued  by  Resell,  Biberman,  and  others  for  many  years  (Biberman, 
2000).  The  model  by  Roberts,  Biberman,  and  Deller  is  described  here  as  an  example. 

A  fixed  resolution  on  the  target  is  selected;  this  is  the  cycles  across  target  to  achieve  a  0.5 
probability  of  task  performance.  For  each  range,  the  known  target  size  and  required 
number  of  cycles  across  the  target  yields  a  spatial  frequency.  The  MRT  curve  is  used  to 
find  the  threshold  contrast  needed  to  resolve  that  frequency.  A  signal  to  noise  ratio  is 
formed  based  on  target  apparent  contrast  and  the  MRT  threshold  contrast.  The  probability 
of  task  performance  is  then  based  on  that  signal  to  noise  ratio.  However,  according  to 
well  documented  but  unpublished  evaluations  by  Lawson  and  Johnson,  these  alternatives 
never  proved  as  successful  as  the  Johnson  criteria  in  estimating  field  performance. 
Models  based  on  Rosell’s  concept  tend  to  predict  very  high  probability  out  to  the  range 
where  the  Johnson  criteria  would  predict  0.5  probability.  At  that  range,  probability  drops 
abruptly  to  zero.  While  it  has  been  argued  that  this  is  realistic  for  poor  weather,  clear 
weather  predictions  follow  the  same  trend.  A  sharp  drop  in  acquisition  probability  is  not 
observed  in  practice. 

There  are  a  number  of  MTF-based  measures  of  image  quality.  The  Johnson  metric  is  one 
of  these,  as  are  Modulation  Transfer  Function  Area,  Integrated  Contrast  Sensitivity, 
Square  Root  Quality  Index,  and  many  others.  The  Target  Task  Performance  (TTP)  Metric 
described  later  in  this  report  is  also  an  MTF-based  measure  of  image  quality.  The  idea  for 
these  metrics  started  with  Shade’s  equivalent  passband.  See  Task  (1976),  Tannas  (1985), 
Snyder  (1973,  1988),  Beaten  (1991),  and  Biberman  (1973,  2000/Chapter  22)  for  surveys 
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of  this  area.  These  metries  share  the  eoneept  that  image  quality  ean  be  quantified  by  some 
weighted  integral  of  signal  modulation  whieh  exeeeds  the  eye  eontrast  threshold.  For 
example,  the  Johnson  frequeney  is  defined  by  the  spatial  frequeney  range  over  whieh  the 
apparent  target  eontrast  exeeeds  the  eye  threshold.  For  the  other  metries,  the  amount  that 
the  signal  modulation  exeeeds  threshold  at  eaeh  spatial  frequeney  is  important.  All  of 
these  metries  share  the  virtue  that  range  predietion  is  easily  implemented;  in  every  ease, 
range  is  simply  proportional  to  the  value  of  the  metrie. 

Researeher’s  in  the  field  have  found  that,  in  general,  MTF-based  metries  aeeount  for 
more  than  half  the  varianee  in  performanee  aeross  the  various  displays  tested.  Although 
the  eorrelation  between  a  partieular  metrie  and  performanee  varies  greatly  from 
experiment  to  experiment  and  task  to  task,  limiting  resolution  measures  like  the  Johnson 
metrie  are  generally  among  the  worst  performers.  However,  prior  to  the  TTP  metrie, 
experiments  at  NVL  have  shown  the  Johnson  eriteria  to  perform  better  than  Modulation 
Transfer  Funetion  Area,  Integrated  Contrast  Sensitivity,  and  other  metries  evaluated 
(Vollmerhausen,  07/2000). 

The  diserepaney  in  experimental  eonelusions  appears  to  be  based  on  the  form  of  the 
analyses.  Most  researehers  ehange  one  or  more  ealibration  “eonstants”  to  fit  ealeulated 
metrie  values  to  experimental  data.  They  argue  that  ehanges  in  task,  observation 
eonditions,  and  observer-to-observer  physiology  requires  that  the  metrie  be  uniquely 
adapted  to  eaeh  experiment.  From  the  standpoint  of  a  target  aequisition  model,  however, 
sueh  a  proeedure  eannot  be  used  to  prediet  performanee;  the  proeedure  requires 
experimental  data  on  whieh  to  base  a  fit.  While  all  models  have  one  or  more  ealibration 
eonstants,  those  eonstants  must  be  determined  onee  and  then  used  for  all  predietions. 
Under  those  eonstraints,  the  Johnson  eriteria  have  proven  to  be  a  reasonable  predietor  of 
performanee,  better  than  other  MTF-based  metries  like  MTFA,  ICS,  and  SQRI. 

Overington  (1976)  and  van  Meeteren  (1990)  both  theorized  that  targets  are  reeognized  by 
a  proeess  of  deteeting  eritieal  features.  This  general  eoneept  has  been  the  foeus  of  several 
researehers  (Biederman,  1987;  O’Kane,  2000).  The  van  Meeteren  model,  as  summarized 
by  Vos  and  van  Meeteren  (1991),  will  be  deseribed  as  it  is  the  most  eomplete  in  terms  of 
predieting  range  performanee. 

Target  aequisition  is  determined  by  a  proeess  of  deteeting  eharaeteristie  details.  The  size, 
eontrast,  and  number  of  eharaeteristie  details  visible  to  the  observer  determines  the 
probability  of  aequisition.  Eaeh  detail  is  treated  as  a  eireular  disk  with  deteetion  based  on 
a  Minimum  Deteetable  Contrast  model.  In  van  Meeteren’s  model,  eye  noise  is 
represented  as  a  fixed  fraetion  of  the  eontrast  threshold  at  eaeh  luminanee  level. 
Deteetion  of  the  eritieal  detail  is  based  on  the  eontrast  signal  exeeeding  the  quadrature 
eombined  deteetor  and  eye  noise  by  a  fixed  amount.  One  eontinuing  problem  with  these 
models  is  the  pre-determination  of  eritieal  features.  It  is  diffieult  to  adapt  the  model  to  a 
new  target  set. 

One  important  aspeet  of  van  Meeteren’s  work  is  explieit  task  definition.  In  his  1990 
JOS  A  paper,  he  deseribes  target  reeognition  as  ehoosing  an  objeet  from  a  known 
eonfusion  set.  That  is,  targets  are  reeognized  by  differentiating  them  from  the  possible 
alternatives.  This  means  that  the  features  whieh  uniquely  define  a  target  are  those  whieh 
differentiate  that  target  from  others  in  the  set.  Most  researehers  ignore  the  important  step 
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of  defining  the  experimental  task.  By  not  reeognizing  that  all  diseriminations  are 
eomparisons,  many  researehers  fall  into  the  trap  of  analyzing  experimental  data  one- 
target-at-a-time.  The  result  is  that  a  model  whieh  appears  to  prediet  one  experiment 
beautifully  fails  to  prediet  subsequent  experiments.  This  distinetion — that  target 
aequisition  models  prediet  the  ability  to  ehoose  one  target  of  a  set,  rather  than  predieting 
the  absolute  probability  of  reeognizing  or  identifying  a  partieular  target — is  partieularly 
important  when  trying  to  assess  the  sueeess  of  feature-based  models. 

The  logie  of  eritieal-feature  reeognition  is  intelleetually  appealing,  but  a  praetieal  model 
whieh  ineorporates  target-set  features  to  prediet  range  performanee  has  not  been  offered. 
It  should  be  noted,  however,  that  aeeepting  the  idea  that  high  level  visual  diseriminations 
are  required  to  reeognize  targets  does  not  invalidate  image  quality  models.  Aeeepting  an 
image  quality  model  like  the  Johnson  eriteria  does  not  infer  that  we  are  not  looking  at 
internal  target  features.  The  target  is  not  in  the  model;  no  judgment  is  being  made  about 
what  is  being  viewed. 

The  inelusion  of  overall  target-set  dimensions  and  average  eontrast  in  range  predietions 
tends  to  eonfuse  this  point.  Those  parameters  are  used  to  deerease  varianee  in  range 
predietions.  For  example,  for  ship  identifieation,  the  “eritieal  dimension”  is  found  to  be 
the  vertieal  height  of  the  ships  to  be  diseriminated,  and  this  is  ineluded  in  the  model. 
Taetieal  military  vehieles  have  more  observable  features  with  a  side  view  than  front 
view;  the  road  wheels  and  gun,  for  example,  are  better  viewed  from  the  side.  The  eurrent 
use  of  square  root  of  target  area  when  making  predietions  for  taetieal  vehieles  has  been 
found  to  adjust  model  output  in  the  eorreet  way.  In  general,  larger,  higher  eontrast  targets 
are  easier  to  see,  and  ineluding  these  faetors  in  the  model  deereases  varianee  in  the 
predietions.  However,  the  fundamental  eoneept  behind  an  image  quality  model  is:  see 
better,  see  further.  Of  eourse  target  details  are  used  to  reeognize  and  identify  targets. 
Better  image  quality  lets  the  observer  make  these  diseriminations  at  longer  range. 

The  fluetuation  theory  developed  by  Rose  provides  a  limiting  eriterion  for  deteetion 
under  low  luminanee  eonditions.  The  basie  assumption,  quite  eorreet  for  very  low  display 
luminanee,  is  that  liminal  vision  is  established  by  the  shot  noise  assoeiated  with  the 
retinal  photo-deteetion  meehanism.  In  virtually  all  eases  where  target  aequisition 
modelers  have  eonsidered  the  nature  of  the  eye,  they  have  assumed  that  shot  noise 
established  the  signifieant  limitations  of  eyesight.  This  is  not  the  ease.  The  target 
aequisition  task  is  dependent  on  the  eharaeteristie  behavior  of  higher-order  visual 
proeessing  within  the  brain. 

For  any  praetieal  display  luminanee,  the  eontrast  limitations  of  the  human  eye  are 
established  by  the  visual  eortex,  not  the  retina.  In  1995,  the  NVL  model  whieh  prediets 
threshold  resolution  versus  spatial  frequeney  was  modified  to  aeeount  for  these  eontrast 
limitations  (Vollmerhausen,  1995  and  2000;  Driggers,  1999).  The  modified  model 
aeeurately  prediets  image  intensifier  performanee  over  a  wide  range  of  seene  illumination 
and  eyepieee  luminanee  eonditions.  Further,  as  will  be  diseussed,  this  model  does  an 
outstanding  job  of  predieting  the  results  of  experiments  using  thermal,  visible,  and  laser 
imagery.  The  model  is  applieable  to  the  whole  range  of  EO  imagers. 

However,  the  1995  NVL  model  eontinued  to  use  the  matehed  filter  eoneept.  Like  all 
models  using  this  filter  assumption,  it  ean  only  be  used  with  sensors  where  the  noise  is 
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essentially  white  (flat)  over  the  frequeney  speetrum  of  the  signal.  While  this  ean  be  a 
serious  limitation  with  modem  sensors,  the  limitation  was  not  serious  for  previous 
generations  of  EO  imagers. 

Prior  to  the  widespread  use  of  sampled  imagers  and  digital  proeessing,  one  eould  assume 
sensor  noise  to  be  essentially  white  in  eomparison  to  the  signal.  The  seene  was  filtered  by 
the  opties  and  deteetor  as  well  as  the  eleetronies,  display,  and  eye.  The  noise  was  only 
filtered  by  the  eleetronies,  display,  and  eye.  The  white  noise  assumption  was  valid  for  the 
vast  majority  of  sensors.  It  is  still  tme  today  that  the  noise  in  most  EO  imagers  is  white 
eompared  to  the  signal  speetmm.  In  modern  sensors,  however,  digital  enhaneement  of  the 
image  ean  make  the  noise  distinetly  non-white.  An  upgrade  to  the  model  is  needed. 

An  upgrade  to  the  1995  NVL  model  to  eorreet  the  eye  filters  is  presented  in  this  report.  In 
this  model,  the  matehed  filters  are  replaeed  with  bandpass  filters.  The  new  eye  filters  are 
based  on  psyehophysieal  data  eolleeted  over  the  last  three  deeades.  The  eombination  of 
the  new  eye  filters  and  the  TTP  metrie  provides  eomplete  flexibility  in  modeling  modern 
EO  imagers. 
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Modeling  Human  Vision 

In  this  section,  some  of  the  marvels,  complexities,  and  limitations  of  human  vision  are 
described.  The  nature  of  the  immediate  task  requires  us  to  focus  on  limitations.  However, 
recognizing  the  capability  and  the  resulting  complexity  of  eyesight  provides  a  needed 
insight;  the  nature  of  vision  cannot  be  encompassed  in  a  simple  model.  As  shown  in  the 
charts  below,  eye  behavior  changes  significantly  with  luminance  and  with  angular 
eccentricity  from  the  fovea.  This  explains  why  the  theory  in  this  report  treats  the  human 
visual  system  as  a  “black  box.”  The  threshold  response  of  the  eye  to  sinewave  gratings  is 
used  to  characterize  vision;  this  is  experimental  data  collected  by  psychophysicists. 

The  eye  provides  some  quality  of  vision  over  a  billion  to  one  range  of  scene  illumination. 
To  accomplish  this,  the  eye  has  cones  for  photopic  or  daytime  vision  and  rods  for 
scotopic  or  night  vision.  The  distribution  of  rods  and  cones  within  the  eyeball  is  shown  in 
Figure  3.1.  The  highest  density  of  cones  is  at  the  center  of  the  fovea,  called  the  foveal  pit. 
There  are  no  rods  in  the  foveal  pit,  a  region  in  the  center  of  the  retina  about  200  microns 
in  diameter.  The  foveal  pit  subtends  about  a  quarter  inch  on  a  display  viewed  from  15 
inches. 
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Figure  3.1  Distribution  of  rods  and  cones  in  the  retina  of  the  eye. 
(Figure  courtesy  Web  vision) 


Rods  and  cones  are  not  equally  sensitive  to  visible  wavelengths  of  light.  Unlike  the 
cones,  rods  are  more  sensitive  to  blue  light  and  are  not  sensitive  to  wavelengths  greater 
than  about  640  nanometers,  the  red  portion  of  the  visible  spectrum. 

Although  factors  like  retinal  processing  and  pupil  dilation  play  important  roles,  photo¬ 
pigment  bleaching  is  the  primary  means  for  adapting  both  rods  and  cones  to  varying 
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illumination.  A  threshold  versus  intensity  (tvi)  curve  can  be  obtained  by  testing  observers 
using  a  small  disk  of  light  against  a  uniform  luminance  background.  When  rods  or  cones 
are  isolated,  four  sections  of  the  tvi  curve  are  apparent:  dark  light,  Square  Root  Law  (de 
Vries-Rose  Law),  Weber's  Law,  and  saturation  (Aguilar  and  Stiles,  1954).  Figure  3.2 
shows  a  tvi  curve  for  rod  vision.  The  figure  plots  the  just-visible  difference  in  luminance 
(ordinate)  versus  the  display  luminance  (abscissa).  “Dark  light”  is  internal,  neural  noise. 
The  second  part  of  the  tvi  curve  is  limited  by  quantal  fluctuation;  this  is  the  square  root 
law  or  de  Vries-Rose  Law  region.  The  next  section  of  the  curve  follows  Weber’s  Law; 
the  threshold  is  a  constant  fraction  of  luminance.  Given  sufficient  light,  the  eye  operates 
on  the  principle  of  contrast  constancy;  this  is  an  important  feature  of  our  visual  system.  In 
a  natural  scene,  object  to  background  contrast  is  fairly  independent  of  ambient 
illumination.  The  final  part  of  the  tvi  curve  is  saturation  at  high  light  levels. 

According  to  Ricco's  Law,  the  eye  sums  quanta  over  an  area.  Threshold  is  reached  when 
the  product  of  luminance  and  stimulus  area  exceeds  a  constant  value.  In  other  words, 
when  luminance  is  halved,  a  doubling  in  stimulus  area  is  required  to  reach  threshold. 
Summation  area  varies  with  eccentricity.  In  the  fovea,  complete  summation  occurs  over 
about  0.1  degree.  Ricco's  Law  holds  for  an  area  of  a  half  degree  at  5°  eccentricity 
increasing  to  an  area  of  about  2°  at  an  eccentricity  of  35“  (Davidson,  1990).  Spatial 
summation  occurs  due  to  the  convergence  of  photoreceptors  onto  ganglion  cells;  clearly, 
spatial  summation  limits  resolution. 

Visual  acuity  is  the  greatest  at  the  center  of  fixation  and  decreases  with  eccentricity.  See 
Figure  3.3  for  a  plot  of  visual  acuity  versus  eccentricity.  There  is  a  close  correlation 
between  cone  density  and  visual  acuity  out  to  about  2  degrees  (Green,  1970). 
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Figure  3.3  Plot  of  visual  acuity  versus 
eccentricity  for  photopic  luminance. 
(Figure  courtesy  Web  vision) 

Figure  3.2  Threshold  versus  intensity  curve 
for  rods;  similar  results  are  found  for  cones. 

(Figure  courtesy  Webvision) 
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The  illumination  range  where  both  rods  and  cones  work  together  is  called  mesopic  vision. 
The  rods  saturate  at  illumination  levels  above  10  fL;  the  cones  cease  to  be  important  in 
mediating  vision  at  just  below  0.01  fL.  The  luminance  from  0.01  to  10  fL  is  essentially 
the  range  of  display  luminance  used  in  military  night  vision  systems.  Displays  used  in 
daylight  would  be  brighter,  of  course.  As  shown  in  Figure3.4,  visual  acuity  varies  greatly 
over  the  mesopic  range  of  display  luminance.  (The  milliLamberts  units  used  in  the  figure 
are  almost  equal  to  fL.) 


Figure  3.4  Visual  Acuity  at 
Mesopic  Light  Levels 
(Figure  courtesy  Webvision) 
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Although  the  variation  in  visual  acuity  with  display  luminance  has  been  measured,  it  is 
difficult  to  predict.  The  interaction  between  rods  and  cones  is  not  well  understood.  Rods 
and  cones  are  distributed  differently  over  the  retina.  Rods  and  cones  have  different 
spectral  responses,  use  different  photo-pigment  chemistry,  saturate  at  different  light 
levels,  and  employ  different  neural  summation  and  processing  schemes. 


The  limitations  of  human  vision  are  important  when  predicting  the  targeting  performance 
of  an  EO  imager.  However,  a  reliable  theory  for  predicting  visual  behavior  is  not 
available.  In  the  target  acquisition  model,  experimental  data  collected  by 
psychophysicists  are  used  to  describe  human  vision. 


3.1  Contrast  Threshold  Function 


The  Contrast  Threshold  Function  (CTF)  is  one  of  the  most  common  and  useful  ways  of 
characterizing  human  vision.  Objects  and  their  surroundings  are  of  varying  contrast. 
Therefore,  the  relationship  between  visual  acuity  and  contrast  allows  a  better 
understanding  of  visual  perception  than  acuity  measured  only  with  high  contrast  (black 
on  white)  charts. 

In  Figure  3.5,  the  observer  is  viewing  a  sine-wave  pattern.  While  holding  average 
luminance  to  the  eye  constant,  the  contrast  of  the  bar  pattern  is  lowered  until  no  longer 
visible  to  the  observer.  That  is,  the  dark  bars  are  lightened  and  the  light  bars  darkened, 
holding  the  average  constant,  until  the  bar-space-bar  pattern  disappears.  A  decrease  in 
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contrast  from  left  to  right  is  shown  at  top,  right  in  the  figure.  The  goal  of  the  experiment 
is  to  measure  the  amplitude  of  the  sinewave  that  is  just  visible  to  the  observer. 


Figure  3.5  Measuring  CTF 


Most  published  CTF  data  is  taken  with  two  alternative,  foreed  ehoice  (2afc)  experiments. 
In  these  experiments,  the  observer  is  shown  one  blank  field  and  one  with  the  sinewave. 
The  observer  must  choose  which  field  has  the  sinewave.  These  experiments  measure  the 
sinewave  threshold  where  the  observer  chooses  correctly  half  the  time  independent  of 
chance.  That  is,  the  2afc  experiment  provides  the  threshold  which  yields  a  0.75 
probability  of  correet  ehoiee.  The  procedure  is  repeated  for  various  bar  spaeings — that  is, 
for  various  spatial  frequeneies.  See  the  bottom,  right  of  the  figure  for  an  illustration  of 
spatial  frequeney;  high  spatial  frequency  is  at  the  left,  lower  spatial  frequency  to  the  right. 
The  curve  of  contrast  threshold  versus  spatial  frequency  at  each  light  level  is  called  the 
CTF  at  that  light  level. 

Figure  3.6  shows  CTF  curves  for  various  adapting  luminance;  the  abseissa  is  spatial 
frequency  and  the  ordinate  is  contrast  threshold.  Each  curve  shows  CTF  for  a  different 
light  level  to  the  eye.  Remember  that  these  curves  use  modulation  to  describe  contrast; 
that  is,  contrast  equals  (bright  -  dark)/(bright  +  dark). 


Figure  3.6  CTF  of  Eye  at 
Various  Light  Levels 

Luminance  values  are  in 
footLamberts. 


o 

C/D 

H 


cd 

O 

U 


0.1 


0.01 


0.001 


0.0001  fL 


cO’ 


JV'CO'' 


0.1 


1  fL 


100  fL 


Spatial  Frequency  (cycles/mrad) 

At  eaeh  light  level,  the  limiting  resolution  is  the  frequency  where  the  CTF  eurve  erosses 
unity  contrast.  Limiting  resolution  provides  the  smallest  detail  that  will  be  visible  at  that 
light  level,  and  this  detail  is  only  visible  at  the  highest  possible  contrast.  CTF  provides 
much  more  information  than  limiting  resolution;  CTF  provides  the  threshold  contrast 
value  at  all  spatial  frequencies. 
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Few  real-world  objects  are  totally  reflective  or  totally  absorptive;  contrast  is  seldom  unity 
in  a  real-world  scene.  A  typical  scene  consists  of  an  infinitude  of  contrast  gradations.  The 
eye’s  ability  to  see  small  contrast  differences  is  critical  to  quality  vision.  From  the  figure, 
note  that  the  eye  loses  its  ability  to  see  small  contrast  changes  as  cone  vision  is  lost.  The 
CTF  curve  rises  as  light  level  decreases.  This  rise  in  the  CTF  curve  results  in  lower 
limiting  resolution  and  also  results  in  loss  of  the  ability  to  see  small  contrast  differences 
at  any  spatial  frequency.  An  interesting  aspect  of  the  CTF  curves  is  that  at  the  higher  light 
levels,  people  have  better  threshold  vision  at  middle  spatial  frequencies  than  at  low 
spatial  frequencies. 

3.2  Contrast  Threshold  Function  in  Noise 

Our  model  for  predicting  the  effect  of  display  noise  on  CTF  was  first  described  by 
Vollmerhausen  (1995,  2000).  This  CTF  model  is  currently  used  in  the  thermal  model  and 
other  EO  imager  models  published  by  the  U.S.  Army.  However,  those  previous 
references  did  not  provide  a  detailed  discussion  of  the  CTF  model  itself.  This  section 
starts  with  some  history  on  modeling  CTF  in  the  presence  of  display  noise,  describes  the 
van  Meeteren  model  which  is  used  quite  often,  and  then  provides  a  discussion  of  the 
current  CTF  model. 

Nagaraja  did  experiments  where  he  found  that  noise  had  an  effect  on  detection  threshold 
which  could  logically  be  explained  by  assuming  that  the  brain  was  taking  the  root-sum- 
square  (RSS)  of  display  noise  and  some  internal  eye  noise  (Nagaraja,  1964).  In  the 
following  equations,  CTFn  is  measured  threshold  modulation  in  the  presence  of  noise,  N 
is  display  noise  modulation  (RMS  noise  divided  by  twice  the  display  luminance),  and  k 
and  Neye  are  parameter  fits. 

CTF^  =  k^[n^  +  nI„)  (3.1) 

Nagaraja  then  observed  that,  if  Neye  and  k  are  constant,  then  plotting  the  square  of 
threshold  in  noise  versus  the  square  of  external  noise  amplitude,  a  straight  line  with  slope 
K  and  intercept  CTF  should  result.  CTF  is  the  measured  threshold  modulation  without 
external  noise. 

CTF^  =k^N^+  CTF ^  .  (3.2) 

When  Nagaraja  plotted  the  experimental  data,  he  found  that  the  plot  of  N  versus  CTFn 
was  linear  at  1  fL  but  that  the  plots  for  0.1  fL  and  0.01  fL  were  not.  So  Equation  3.1  was 
correct  for  1  fL  but  was  less  accurate  at  lower  display  luminance.  Other  investigators 
have  found  that  Equation  3.1  is  approximately  true  for  a  wide  range  of  conditions  and 
tasks  (Pelli  1981,  Legge  1987,  Van  Meeteren  1988,  Pelli  1999).  Different  tasks  include 
detecting  small  disks  against  a  uniform  background,  reading  letters,  detecting  bar 
patterns,  and  sinewave  threshold  detection. 

Both  K  and  Ngye  are  often  stated  in  the  literature  to  be  constants  fit  to  the  experimental 
data  implying  that  they  are  constants  relative  to  both  the  observer  and  to  the  stimulus 
used  in  the  experiment.  If  either  is  changed,  both  of  these  factors  can  change  as  well.  This 
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means  that  if  an  experiment  is  eondueted  with  several  different  sinusoidal  gratings,  k  and 
Neye  will  be  different  for  eaeh  grating  and  eaeh  observer. 

For  a  limited  range  of  spatial  frequeney  gratings  and  for  photopie  luminanee,  van 
Meeteren  has  demonstrated  that  k  in  Equation  3.2  varies  slowly;  he  treats  k  as  a  eonstant. 
See  Barten  (1999)  for  additional  discussion  on  Van  Meeteren’s  treatment.  However,  this 
is  the  same  assumption  as  used  by  Lawson  to  develop  the  IIV4  Image  Intensifier  Model; 
this  model  does  not  provide  good  predictions  when  display  luminance  is  mesopic,  which 
is  almost  always  the  case  (Vollmerhausen,  1995). 

The  current  model  is  derived  as  follows.  From  Equation  3.1,  at  each  specific  frequency 
and  light  level, 

CTP^  =  k^nI,,  (3.3) 


Using  (3.4)  in  (3.2) 
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where  a  is  the  RMS  noise  on  the  display  and  neye  is  the  RMS  eye  noise  expressed  at  the 
display. 


Using  Weber’s  Law,  assume  that  eye  noise  is  proportional  to  display  luminance  (L).  This 
proportionality  holds  over  most  of  the  functional  luminance  range  of  the  human  eye 
(Pelli,  1999;  Blackwell,  1958;  Section  1.632  of  Boff,  1988;  Webvision,  2003).  For 
display  luminance  above  the  de -Vries-Rose  Law  region  and  for  statically  presented 
stimuli,  the  visibility  of  foveally  presented  signals  is  limited  by  noise  arising  in  the  cortex 
after  spatiotemporal  and  binocular  integration  (Raghavan,  1989). 


CTF^  =CTF^ 


2  2  ^ 
a  <j 


(3.7) 


It  should  be  remembered  that  eye  noise  is  a  concept  used  to  explain  non-zero  thresholds. 
The  actual  reason  that  the  liminal  signal  is  greater  than  zero  is  not  known.  Rose  and  de 
Vries  correctly  assumed  that  the  statistics  associated  with  photo-detection  limits 
psychometric  contrast  at  low  luminance.  In  this  case,  signal  is  proportional  to  luminance 


23 


and  noise  is  proportional  to  the  square  root  of  luminanee,  so  psychometric  contrast 
decreases  in  inverse  proportion  to  the  square  root  of  luminance.  However,  this 
assumption  only  holds  for  the  lowest  absolute  luminance  needed  for  rod  or  cone 
operation.  At  higher  luminance  levels,  signal  detection  threshold  is  proportional  to 
luminance,  and  psychometric  contrast  is  constant.  The  reason  for  this  change  in  behavior 
as  luminance  increases  might  not  actually  be  noise,  but  rather  an  adaptation  of  the  visual 
system  to  aid  the  brain  in  interpreting  imagery.  Whatever  the  cause.  Figure  3.2  does 
indicate  that  threshold  is  proportional  to  display  luminance  over  most  of  the  luminance 
range  usable  by  the  eye. 

Once  the  calibration  constant  (a)  is  determined  by  experiment.  Equation  3.7  provides  an 
accurate  means  of  predicting  the  effect  of  display  noise  on  contrast  threshold.  As 
described  in  the  next  section,  however,  some  of  the  frequency  spectrum  of  the  display 
noise  is  filtered  out  by  the  eye.  The  value  of  a  is  given  after  the  noise  filter  is  discussed. 

3.3  Visual  Bandpass  Filters 

It  is  important  to  note  that  the  signal  and  noise  in  Equation  3.7  are  taken  with  respect  to 
the  bandpass  properties  of  the  human  visual  system.  In  other  words,  the  noise  that  affects 
a  particular  visual  process  does  not  include  all  frequencies  of  noise  capable  of  being 
represented  on  the  display.  Eigure  3.7  provides  an  illustration  of  the  eye  filter  acting  on 
the  incoming  signal  and  noise.  This  figure  is  provided  as  an  aid  to  understanding  that  the 
RMS  noise  in  Equation  3.7  must  be  spatially  filtered  in  order  to  get  accurate  predictions 
ofCTEn. 

The  eye  exhibits  behavior  that  seems  to  imply  the  presence  of  selective  spatial  frequency 
channels.  Exposure  to  one  bar  pattern  or  sinewave  grating  can  affect  the  visibility  of  a 
second  pattern.  This  effect  is  termed  masking.  Masking  only  occurs,  however,  if  the  bar 
patterns  are  close  to  the  same  size  and  oriented  in  the  same  direction  (Eegge,  1987). 

The  extent  to  which  noise  masks  a  signal  depends  on  the  spatial  frequency  of  the  signal 
and  the  spectral  content  of  the  noise  (Stromeyer,  1972;  van  Meeteren,  1988;  Greis,  1970). 
That  is,  the  noise  spectral  density  might  not  be  constant  over  the  frequency  limits  being 
considered.  If  the  noise  spectral  density  is  not  constant,  then  the  noise  is  “colored.”  The 
ability  of  colored  noise  to  mask  a  signal  depends  on  the  relative  position  of  the  signal  and 
noise  in  the  frequency  domain.  CTEn  depends  on  the  power  spectral  density  of  the  noise 
rather  than  total  noise  power  (Raghaven,  1989). 
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Figure  3.7  Spatial  Filter  Acts  Upon  Incoming  Signal  and  Noise 


Figure  3.8  shows  the  visual  filters  proposed  by  Barten  (1999)  based  on  fit  to 
psyehophysieal  data.  The  filters  shown  are  for  0.125,  0.25,  and  0.5  cycles  per  milliradian 
sinusoidal  gratings.  Equation  3.8  gives  the  formula  for  the  Barten  eye  filter  B(^).  is  the 
frequency  of  the  sinewave  grating.  When  using  Barten’s  formulation,  the  signal  is 
expressed  as  modulation. 


Figure  3.8  Illustration  of 
Three  Barten  Eye  Filters 


exp  ( 


2.2 


(3.8) 


As  verified  by  numerical  integration  of  Equation  3.8,  the  bandwidth  of  Barten’s  filters 
increases  in  proportion  to  ^o-  Given  a  level  of  white  noise,  signal  to  noise  increases  in 
proportion  to  the  square  root  of  bar  size.  This  is  because,  with  noise  power  spectral 
density  constant  over  the  frequency  band  of  interest,  the  noise  associated  with  a  filter  is 
proportional  to  the  square  root  of  bandwidth.  So  Barten’s  filters  work  in  lieu  of  the 
matched  filters  for  white  noise.  This  has  also  been  verified  by  using  both  types  of  filters 
with  Equation  3.7  to  predict  the  image  intensifier  experiment  reported  by  Vollmerhausen 
(1995).  Both  eye  filters  give  identical  results  in  white  noise. 

However,  the  Barten  filters  also  work  with  colored  noise.  Barten  compares  predicted  and 
experimental  results  from  two  researchers  (van  Meeteren,  1988;  Stromeyer,  1972). 
Barten’s  eye  filters  also  predict  the  results  of  Chen  (1994). 
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To  illustrate  the  benefit  of  the  Barten  filters  over  the  matehed  filters,  eonsider  the  Air 
Foree  3-bar  ehart  shown  in  Figures  3.9a  through  e.  In  Figure  3.9b,  low  frequeney  noise  is 
superimposed  on  the  ehart  patterns.  In  Figure  3.9e,  the  image  is  eorrupted  with  high 
frequeney  noise;  low  spatial  frequeney  noise  has  been  filtered  out.  The  standard  deviation 
of  the  noise  is  the  same  for  both  Figures  3.9b  and  3.9e.  Looking  at  the  high  frequency 
bars  (the  small  bars  to  the  right,  center  of  the  picture),  the  high  frequency  noise  masks  the 
bars  more  than  the  low  frequency  noise.  If  the  eye  were  simply  integrating  over  a  bar 
area,  the  low  frequency  noise  would  actually  be  more  effective  in  masking  the  high 
frequency  bars. 


a  b  c 


Figure  3.9  Air  Force  3-bar  Chart  Corrupted  by  Low  (b)  and  High  (c)  Frequency  Noise 


This  is  illustrated  by  Figure  3.10.  That  figure  shows  the  spectra  for  both  the  low  and  high 
frequency  noise.  Also  shown  are  both  the  matched  bar  filter  and  the  Barten  eye  filter 
associated  with  the  smallest  3-bar  pattern  in  Figure  3.9.  For  this  discussion,  it  is  assumed 
that  the  bar  charts  are  viewed  from  a  distance  five  times  the  width  of  one  chart;  Figure 
3.9  is  three  charts  wide.  The  smallest  3-bar  pattern  is  group  -1  pattern  6  with  a  spatial 
frequency  of  about  0.275  cycles  per  milliradian  when  viewed  from  a  distance  five  times 
the  width  of  one  chart. 


Figure  3.10  Plot  of  Noise  Spectra  and  Eye 
Filters  The  low  frequency  noise  associated  with 
Figure  15b  and  the  high  frequency  noise 
associated  with  Figure  15c  are  plotted.  Also 
shown  are  the  Richards  and  Barten  eye  filters 
associated  with  the  smallest  bar  pattern. 


spatial  frequency  (cycles/milliradian) 


Because  the  matched  filter  represents  an  integration  over  the  bar  area,  the  matched  filter 
has  a  better  response  at  DC  than  at  higher  spatial  frequencies.  The  matched  filter  cannot 
explain  masking.  However,  the  Barten  filters  are  consistent  with  observed  masking 
behavior.  Equation  3.7  with  the  Equation  3.9  eye  filters  provides  the  foundation  for 
predicting  the  image  quality  of  imaging  sensors. 
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3.4  Validity  of  Weber’s  Law 

The  CTFn  model  assumes  eontrast  constancy.  That  is,  threshold  luminance  increases  in 
proportion  to  display  luminance;  this  is  Weber’s  Law.  Many  researchers  would  object 
that  this  is  not  the  behavior  of  a  normal  square  law  detector.  Further,  certainly  a  system  as 
highly  evolved  as  the  human  eye  would  better  approach  the  theoretical  limits  represented 
by  the  de-Vries-Rose  Square  Root  Law.  Perhaps,  however,  the  eye  is  not  a  normal  square 
law  detector.  And  perhaps  it  is  not  optimized  for  liminal  photon  detection. 

From  a  purely  physical  standpoint,  photo-chemicals  in  the  eye  are  leached  out  by  light; 
absolute  quantum  efficiency  of  the  eye  decreases  as  illumination  increases.  Further,  it  is 
certainly  not  unreasonable  to  assume  that  the  eye-brain  system  is  optimized  for  higher 
order  discriminations.  Perhaps  contrast  constancy  and  color  constancy  provide  the  visual 
system  a  way  of  adapting  to  changing  environments.  In  an  evolutionary  sense,  it  may  be 
that  a  visual  system  which  responds  uniformly  at  sunrise,  noon,  sunset,  in  the  open,  in  a 
cave,  or  under  the  shade  of  a  tree  is  more  important  than  the  absolute  level  at  which  a 
faint  light  is  detected  on  a  dark  night. 

Unfortunately,  experimental  support  for  Weber’s  Law  is  mixed;  some  experiments 
support  the  idea  of  contrast  constancy  over  a  large  variation  in  illumination,  other 
experiments  do  not.  It  is  necessary,  therefore,  to  discuss  this  assumption  in  more  detail  as 
it  relates  to  our  CTFn  model. 

Figure  3.11  plots  absolute  threshold  versus  display  luminance  for  spatial  frequencies 
between  0.1  and  1.5  cycles  per  milliradian  (cy/mrad).  The  threshold  predictions  are  based 
on  Barten’s  CTF  numerical  fit  (Barten,  2000)  and  on  a  numerical  fit  to  eye  MTF  (see 
Appendix  E).  The  figure  also  shows  a  straight  line;  if  Weber’s  Law  were  exact,  then  all 
of  the  CTF  data  would  lie  on  a  straight  line.  A  dotted  line  representing  the  de-Vries-Rose 
Square  Root  Law  is  also  shown.  The  exact  ordinate  position  of  the  lines  is  not  relevant, 
because  experimental  data  are  used  to  calibrate  the  model.  The  question  addressed  here  is 
the  functional  relationship  between  CTF  and  luminance. 


Figure  3.11  Plot  showing  CTF  at  8 
spatial  frequencies  for  luminance 
levels  between  0.001  and  1000  fl. 
The  solid,  straight  line  represents 
“Weber’s  Law.”  The  dotted  line 
represents  the  “Square  Root  Law.” 


2 

o 


cn 

P 


P 

o 

cn 

CD 


10  - 

'  -  - -sq.rt.law  - linear  «  0.1 

□  0.3  A  0.5  X  0.7 

X  0.9  0  1.1  1.3 

-  1.5  spatial  frequencyes  cy/mrad 

1  - 

0.1  - 

A 

0^,1" 

A. ''M  -  - ' 

0.01  - 

0.001  - 

0.0001  - 

0 

0.00001  - 

- ^ ^ ^ ^ ^ - 

0.001  0.01 


0.1 


1 


10 


luminance  (fL) 


100  1000 


The  CTF  data  are  adjusted  to  remove  the  effect  of  MTF  variations  which  result  from 
differences  in  pupil  dilation  as  luminance  changes.  A  correction  for  pupil  dilation  is 
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included  in  the  eventual  model  which  is  described  in  Part  4;  the  pupil-related  changes  in 
CTF  should  not  be  ineluded  in  the  eurrent  diseussion. 

Beeause  of  eyeball  MTF  and  the  functioning  of  the  visual  eortex,  each  spatial  frequency 
transitions  at  a  different  light  level  from  the  Square  Root  Law  region  to  the  Weber’s  Law 
region  and  eventually  to  saturation.  Low  frequeneies  are  seen  at  very  low  luminanee 
levels.  High  spatial  frequencies  require  more  light.  So  Weber’s  Law  is  applieable  to 
different  spatial  frequeneies  at  different  light  levels.  This  is  refleeted  in  the  figure  by  the 
absenee  of  high  spatial  frequency  predictions  for  low  luminance. 

Weber’s  Law  is  not  exact,  but  it  better  fits  our  needs  than  the  Square  Root  Law.  Looking 
at  Figures  3.2,  3.4,  and  3.11,  the  CTFn  model  ean  be  expeeted  to  provide  aeeurate 
predietions  for  display  luminanees  between  0.01  and  100  fL  and  approximate  predietions 
from  about  0.001  to  about  1,000  fL. 
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4 


CONTRAST  THRESHOLD  FUNCTION  OF  AN  IMAGER 

The  observer  in  Figure  4.1  is  viewing  the  seene  through  an  imager  and  trying  to  identify 
the  target.  The  imager  helps  her  by  magnifying  the  target  and  by  permitting  the 
observation  of  illumination  not  normally  visible  to  the  eye.  However,  the  camera  and 
display  add  noise  and  blur.  This  section  describes  the  observer’s  Contrast  Threshold 
Function  when  looking  through  the  imaging  system.  The  system  Contrast  Threshold 
Function  (CTFjys)  is  the  naked-eye  CTF  degraded  by  the  amount  necessary  to  account  for 
the  blur  and  noise  added  by  the  imager. 


Figure  4.1  Observer  viewing 
scene  through  an  imager  is 
trying  to  identify  the  target. 


As  an  aid  in  understanding  the  formulas  for  CTFgys,  a  simple  imaging  system  is  illustrated 
in  Figure  4.2.  An  objective  lens  focuses  light  onto  a  focal  plane  array  (FPA)  of  detectors. 
Photo-current  is  generated  over  the  active  area  of  an  individual  detector;  the  active  area  is 
indicated  by  the  hatched  areas  shown  in  the  inset.  The  scene  is  blurred  because  of 
diffraction  and  aberrations  in  the  objective  lens;  the  scene  is  also  blurred  because  of  the 
finite  size  of  the  active  detector  area.  The  signal  is  the  total  photo-current  in  each 
detector;  shot  noise  is  added  to  the  signal  by  the  statistical  nature  of  the  photo-detection 
process.  The  individual  detector  samples  are  electronically  formatted,  perhaps  filtered 


Figure  4.2  Illustration  of  a  staring  imager.  Optics,  detector,  display,  electronics, 
and  eye  all  blur  the  image.  Noise  is  added  to  the  signal  during  photo-detection. 
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electronically,  and  then  displayed.  So  blur  can  be  added  by  the  electronics.  The  display 
pixels  have  a  finite  size,  and  this  also  adds  blur  to  the  image.  Finally,  the  eyeball  adds 
blur. 

The  signal  is  blurred  by  every  component  in  the  image  processing  chain;  the  noise  is  only 
blurred  by  components  subsequent  to  photo-detection.  In  the  model,  the  Fourier 
transform  of  the  total,  signal  blur  is  called  system  MTF,  whereas  the  Fourier  transform  of 
the  blur  which  filters  noise  is  called  the  noise  filter  MTF.  Noise  filter  MTF  is  a 
component  of  system  MTF. 

4.1  Effect  of  Blur  on  Imager  CTF 

The  effect  of  noise  on  CTF  has  been  discussed,  but  the  effect  of  blur  has  not  yet  been 
quantified.  In  Figure  4.3a,  the  sinewave  chart  is  just  visible  to  the  observer.  In  4.3b,  an 
optical  system  has  been  introduced  between  the  display  and  the  eye,  reducing  the  visible 
modulation  to  below  threshold.  Assume  unity  magnification  and  that  the  telescope  MTF 
is  Hsys(^).  In  4.3c,  the  displayed  modulation  has  been  increased  so  that  the  sinewave  is 
once  again  visible.  The  display  modulation  must  be  increased  by  the  amount  lost  in  the 
optics.  Equation  3.7  for  CTFn  is  modified  as  shown  in  Equation  4.1  to  yield  CTEsys. 
CTEsys  is  the  Contrast  Threshold  Function  through  the  imager;  it  degrades  naked-eye 
CTF  by  the  amount  necessary  to  account  for  imager  noise  and  blur. 


Figure  4.3  Just-visible  sinewave  modulation 
in  (a)  is  decreased  by  the  introduction  of  the 
telescope  in  (b).  The  display  modulation  must 
be  increased  for  the  sinewave  to  once  again  be 
visible  in  (c). 
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Equation  4.1  is  one-dimensional,  but  imagery  has  two  dimensions.  In  our  models,  sensors 
are  analyzed  in  the  vertical  and  horizontal  directions  separately,  and  a  summary 
performance  calculated  from  the  separate  analyses.  The  point  spread  function,  psf,  and 
the  associated  MTE  are  assumed  to  be  separable  in  Cartesian  coordinates.  The 
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separability  assumption  reduces  the  analysis  to  one  dimension  so  that  complex 
calculations  that  include  cross-terms  are  not  required.  This  approach  allows 
straightforward  calculations  that  quickly  determine  sensor  performance. 

The  separability  assumptions  are  almost  never  satisfied,  even  in  the  simplest  cases.  There 
is  generally  some  calculation  error  associated  with  assuming  separability.  Generally,  the 
errors  are  small,  and  the  majority  of  scientists  and  engineers  use  the  separability 
approximation.  However,  care  should  be  taken  not  to  apply  the  model  to  circumstances 
which  are  obviously  not  separable;  for  example,  diagonal  dither  cannot  be  modeled 
correctly,  nor  can  diamond  shaped  detectors. 

Since  most  imagers  do  not  exhibit  the  same  resolution  characteristics  in  the  horizontal 
and  vertical  directions,  the  CTF  in  each  direction  must  be  modeled  separately.  The 
sinewave  pattern  at  the  left  in  Figure  4.4  is  used  to  generate  horizontal  CTF  modulation; 
the  pattern  to  the  right  is  used  to  generate  vertical  CTF  modulation. 


Figure  4.4  Charts  Used  to  Generate  CTF 
Modulation  The  left-hand  chart  is  used  for 
horizontal  CTF;  the  right-hand  chart  is  used 
for  vertical  CTF, 


Horizontal  CTF 


Vertical  CTF 


Also,  most  imagers  have  a  magnification  different  than  unity;  the  scene  is  magnified  and 
objects  look  bigger  than  without  the  imager.  In  our  models,  the  calculations  are  done  in 
the  spatial  frequency  domain  associated  with  object  space.  Spatial  frequency  at  the  eye 
(^eye)  is  related  to  spatial  frequency  in  object  space  (^)  by: 

‘^eye=%MAG 


where  SMAG  is  the  system  magnification. 

Equations  4.3  and  4.4  give  the  horizontal  and  vertical  noise  bandwidths,  respectively, 
which  are  associated  with  calculating  horizontal  system  CTF  (CTFHsys).  The  formula  for 
CTFHsys  is  given  in  Equation  4.5.  Equations  4.6  and  4.7  show  the  horizontal  and  vertical 
noise  bandwidths,  respectively,  for  vertical  system  CTE  (CTEVsys).  Equation  4.8  shows 
the  calculation  of  CTEVsys.  In  these  equations,  and  r\’  are  dummy  variables  with  units 
of  cycles  per  milliradian  in  object  space.  The  integrations  are  over  all  frequencies. 
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In  these  equations, 

^  =  horizontal  spatial  frequeney  in  (milliradian)'^ 
p  =  vertical  spatial  frequency  in  (milliradian)'^ 

p  =  detector  noise  power  spectral  density  in  units  of  fL  -second-milliradian 

L  =  display  luminance  in  fL 

SMAG  =  angular  magnification 

B(^  or  p)  =  the  Equation  (3.9)  eye  fdters 

Heye(^  or  p)  =  eyeball  MTF 

Heiec(^)  =  horizontal  electronics  MTF 

VeiecCfi)  =  vertical  electronics  MTF 

Hdsp(^)  =  horizontal  display  MTF 

Vdsp(p)  =  vertical  display  MTF 

Hsys(^)  =  horizontal  system  MTF 

Vsys(p)  =  vertical  system  MTF 

QHhor  =  horizontal  noise  bandwidth  for  CTFHsys 

QVhor  =  vertical  noise  bandwidth  for  CTFHsys 

QHver  =  horizontal  noise  bandwidth  for  CTFVsys 

QVver  =  vertical  noise  bandwidth  for  CTFVsys 

4.2  Effect  of  Contrast  Enhancement  on  Imager  CTF 

An  assumption  used  in  the  derivation  of  Equations  4.1,  4.5,  and  4.8  is  that  the  luminance 
variations  on  the  display  are  proportional  to  the  luminance  or  temperature  variations  in 
the  scene.  If  the  imager  has  independent  gain  and  level  controls,  this  proportionality  can 
be  lost.  In  fact,  since  gain  enhancement  can  improve  target  acquisition  performance,  it  is 
likely  that  proportionality  will  not  exist  under  low  contrast  conditions. 

Contrast  enhancement  is  achieved  by  gaining  the  signal  and  then  lowering  the  display 
brightness  back  to  the  original  value.  This  is  illustrated  in  Figure  4.6.  In  panel  (a),  the 
display  luminance  is  proportional  to  scene  variations  in  luminance  (or  temperature  for 
thermal  imagers).  The  figure  shows  an  average  luminance  (F)  and  a  change  in  luminance 
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(AL).  In  panel  (b),  the  display  luminance  is  gained  by  a  factor  Kcon-  All  display  luminance 
values  increase,  including  the  average  display  luminance.  In  panel  (c),  the  display 
brightness  control  is  used  to  decrease  average  display  brightness  back  to  the  original 
value  (L).  However,  the  change  in  luminance  (AL)  is  now  (Kcon  AL).  The  contrast  has 
increased  by  Kcon. 


(a) 


Figure  4.6  Panel  (a)  shows  display  luminance  proportional  to  scene  variations  in 
luminance  or  temperature.  Panel  (b)  shows  a  signal  gain  of  Kcon.  In  (c),  average 
luminance  is  the  same  as  in  (a),  and  display  contrast  has  increased  by  Kcon. 


While  gain  enhancement  does  increase  perceived  noise,  noise  only  increases  in 
proportional  to  signal.  The  net  effect  of  gain  enhancement  is  to  reduce  the  impact  of  eye 
contrast  limitations  on  performance.  With  an  electronic  contrast  improvement  of  Kcon, 
Equation  4.1  becomes: 
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Similarly,  Equations  4.5  and  4.8  for  horizontal  and  vertical  CTEgys  become: 
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4.3  The  Effect  of  Display  Glare  on  Imager  CTF 

In  Eigure  4.7,  the  soldier’s  ability  to  see  the  display  depends  on  the  environment;  sunlight 
reflecting  off  the  display  surface  can  hide  the  image.  Display  glare  can  also  be  caused  by 
maladjustment  of  the  display  brightness  control.  Whatever  the  cause,  glare  can  seriously 
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degrade  targeting  performanee.  Glare  represents  a  reduction  in  contrast  at  all  spatial 
frequencies.  The  display  modulation  loss  is: 
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(4.12) 


where  Lgiare  is  the  glare  luminance  and  L  is  the  average  display  luminance.  Equations  4.1, 
4.5,  and  4.8  now  become: 
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Equations  4.14  and  4.15  describe  quality  of  vision  when  using  an  imager.  Different  types 
of  electro-optical  sensors  are  modeled  by  analyzing  the  blurs  and  noise  associated  with 
the  particular  technology.  Specific  formulations  for  CTEHsys  and  CTEVsys  for  various 
types  of  imagers  are  derived  later  in  this  report. 


Figure  4,7  At  left,  clouds  are  obscuring  the  sun,  and  the  soldier  sees  the  display  clearly. 
At  right,  the  sun  is  out,  and  glare  from  the  display  hides  the  underlying  image. 


4.4  Limit  on  Modulation  Gain 

Electronic  or  digital  processing  can  boost  intermediate  and  high  spatial  frequencies, 
improving  the  displayed  representation  of  scene  detail  and  enhancing  target  acquisition 
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performance.  An  example  of  high  frequency  boost  is  discussed  in  Appendix  B;  in  that 
example,  the  high  frequencies  of  a  blurred  image  are  boosted  by  a  factor  of  eight,  and  the 
peak  “after  boost”  image  modulation  is  a  factor  of  1.7  greater  than  in  the  original,  un¬ 
blurred  image.  In  that  particular  experiment,  boost  increased  the  probability  of  correctly 
identifying  targets  by  about  0.2.  Because  of  the  TTP  metric  and  the  new  eye  filters,  this 
type  of  realistic  image  improvement  can  now  be  modeled. 

Since  the  Static  Performance  Model  was  first  published  in  1975,  however,  there  has  been 
a  general  confusion  about  how  modulation  gain  is  handled  in  NVL  and  NVESD  models. 
The  confusion  can  be  clarified  by  describing  how  sensor  system  gain  is  established  in  the 
model. 

Sensor  gain  is  established  by  specifying  display  minimum  and  average  luminance.  For 
refiected-light  sensors,  this  tells  the  model  the  delta  display  luminance  that  corresponds 
to  the  scene  illumination  and  target  to  background  reflectance  differences.  For  thermal 
imagers,  the  system  gain  is  established  by  specifying  scene  contrast  temperature;  this  is 
the  scene  delta  temperature  that  generates  the  average  display  luminance.  This  indirect 
method  of  specifying  system  gain  is  much  simpler  for  the  model  user  than  requiring  that 
actual  gain  state  be  input. 

The  model  user  could  be  asked  to  specify  component-by-component  absolute  gain.  This 
would  mean  inputting  the  responsivity  of  the  detector,  the  actual  gain  of  any  automatic- 
gain-control  electronics,  and  the  gain  of  the  display  (voltage  input  to  luminance  output). 
A  version  of  the  image  intensifier  CCD  (I  CCD)  model  used  this  method  of  specifying 
system  gain;  the  method  was  universally  hated  by  model  users.  The  I  CCD  model  used 
this  approach  because,  at  very  low  illumination  levels,  the  early  cameras  could  not 
output  sufficient  voltage  to  drive  video  displays  to  the  desired  output  luminance;  the 
model  had  to  estimate  available  output  luminance.  Using  the  model,  however,  required 
providing  information  about  electronics  and  display  design  not  normally  available  to 
systems  analysts. 

Modern  imagers,  including  current  I  CCD  cameras,  provide  sufficient  gain  that  the 
operational  user  can  set  the  display  luminance  as  desired.  By  understanding  the 
operational  user’s  environment  and  needs,  the  systems  analyst  can  make  a  good  estimate 
of  the  display  luminance  which  will  be  chosen  by  the  hardware  user.  That  is,  an  aviator 
flying  without  a  pilotage  aid  will  keep  luminance  from  instrumentation  displays  at  0.1  to 
0.3  fF  in  order  to  maintain  dark  adaptation;  he  wants  to  see  outside  as  well  as  see  his 
instruments.  On  the  other  hand,  if  the  aviator  is  using  a  pilotage  aid  like  the  Aviator’s 
Night  Vision  Imaging  System  (I  goggles)  or  the  Pilot’s  Night  Vision  System  (a  thermal 
imager  and  helmet  display  system),  then  display  luminance  is  typically  set  in  the  1  to  10 
fF  region;  with  the  higher  display  luminance,  he  sees  both  instrument  information  and  the 
outside  scene  better.  Generally,  the  systems  analyst  can  make  a  reasonable  estimate  of 
display  luminance  if  he  understands  the  operational  user’s  task  and  environment. 

In  the  experiment  described  in  Appendix  B,  the  average  display  luminance  was  5  IF.  A 
display  signal  modulation  of  1.0  at  any  spatial  frequency  means  that  a  fully  modulated 
sinewave  in  the  scene  would  be  displayed  with  a  peak-to-peak  luminance  of  10  IF. 
Suggesting  that  the  modulation  could  be  1.7  would  mean  that  the  sinewave  would  have  a 
peak-to-peak  displayed  luminance  of  17  fF  and  an  average  luminance  of  8.5  IF.  This  is 
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not  true;  the  average  luminance  is  5  fL.  A  display  modulation  greater  than  one  makes  no 
physical  sense. 

So  system  gain  is  established  by  display  luminance;  it  is  not  established  by  multiplying 
the  various  component  MTF.  The  purpose  of  component  MTF  is  to  establish  the  relative 
frequency  spectrum  of  the  displayed  image.  Hsys  and  Vgys  in  Equations  4.14  and  4.15  are 
normalized  to  a  peak  MTF  of  1 .0. 


4.5  Example  Calculation  of  CTF^ys 

This  section  presents  an  example  to  illustrate  how  the  contrast  threshold  function  through 
the  imager  is  calculated;  this  is  done  using  the  blur  and  noise  characteristics  of  the 
imager.  A  staring  imager  is  shown  in  Figure  4.8.  Light  is  focused  on  the  focal  plane  array 
(FPA)  by  the  objective  lens;  the  image  is  blurred  by  both  the  lens  and  the  finite  size  of  the 
detectors  on  the  FPA.  Noise  is  added  by  the  photo-detection  process.  The  signal  and 
noise  are  filtered  by  the  electronics  and  display. 


Figure  4,8  Schematic  diagram  of  a  staring  imager. 

The  imager  has  the  following  characteristics: 

Focal  length  =  30  centimeters  (cm), 

Aperture  diameter  =10  cm, 

Array  size  =  640  horizontal  by  480  vertical  detectors. 

Detector  size  =  20  microns  on  20  micron  pitch  (100%  fill  factor). 

Instantaneous  field  of  view  =  0.067  milliradians 
Half-sample  frequency  =  7.5  milliradian'\ 

System  magnification  =10. 

The  system  MTF  (including  optics,  detector,  and  display  MTF)  is: 

=  4.16 

where  ^  is  spatial  frequency  in  cycles  per  milliradian  (cy/mrad).  Post-filter  MTF  from 
electronics  and  display  is: 

=  4.17 

The  display  luminance  is  5  fL;  at  this  luminance.  Equation  4.18  provides  a  good 
approximation  for  eye  MTF,  and  Equation  4.19  is  a  good  approximation  for  eye  CTF. 
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See  the  appendix  of  Chapter  12  in  Vollmerhausen  (2000)  for  numerieal  fits  for  other 
displays  luminanees. 


Heyei^)  =  e 


2.24 /SMAG 


4.18 


C7F(a  = - ,  -  4.19 

where 

^  _  (0.O6544  +  0.764^ /SM4G  + 2.2164^^  /m4G^  )l0^  ^  2o 

1 9  +  8 1 .54^ /  m4G  +  237^^  /  5M4G^ 


Equation  4.21  gives  the  formula  for  the  eye  filters  E(^)  proposed  by  Barten.  ^  is  the 
frequeney  of  the  sinewave  grating;  is  a  dummy  variable  used  to  integrate  over  noise 
bandwidth.  The  eye  MTE  in  Equation  4.18  is  from  the  eyeball;  Equation  4.21  represents 
the  bandpass  filters  assoeiated  with  higher-order  visual  proeessing  in  the  visual  eortex. 


E{^') 


exp 


-2.2 


log 


n2 


4.21 


In  order  to  ealeulate  CTEsys,  the  RMS  display  noise  g  must  be  determined.  Sinee  a  is  the 
noise  as  sensed  by  the  eye,  the  hardware  display  noise  must  be  filtered  by  the  eye 
temporal  integration,  eye  MTE,  and  by  the  bandpass  filter  in  Equation  4.21.  To  ealeulate 
G,  the  power  speetral  density  assoeiated  with  the  display  noise  is  found  and  then 
multiplied  by  the  noise  bandwidths. 

Assume  that  the  signal  to  noise  ratio  for  the  average  pixel  is  8:1.  The  power  speetral 
density  (psd)  is  the  square  of  the  RMS  noise  for  one  seeond  and  one  milliradian  in  each 
dimension.  There  are  60  frames  per  second  and  15  pixels  per  milliradian  in  each 
direction.  Noise  increases  as  the  square  root  of  the  number  of  independent  samples 
summed.  The  signal  integrated  over  the  same  angle  and  time  results  in  a  5  fL  display 
luminance.  The  integrated  signal  increases  in  proportion  to  the  number  of  samples.  The 
psd  is  therefore: 

f 

=  (0.0054)^  fL^  •  second •  milliradian^ •  4.22 


psd  = 


8 


•V60-15-15 


60-15-15 


The  spatial  psd  is  two-sided;  that  is,  frequency  integrations  are  taken  from  minus  infinity 
to  infinity. 

Contrast  threshold  of  the  imager  (CTEsys)  can  now  be  calculated. 
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4.23 


CTF,y,{^)  = 


CTF,y,(^) 


Hsysi^) 


1  + 


r^psdQHiS)QrQ,{L)'^'^ 


I" 


J 


where  y  is  a  unitless  ealibration  constant  which  is  not  the  same  as  the  parameter  a  in 
Equations  4.5,  4.8,  4.9,  etcetera.  The  relationship  between  y  and  a  will  be  explained. 
Qt(L)  is  the  eye  temporal  filter  at  luminance  L.  Equations  4.24  and  4.25  provide  the 
spatial  filters  for  horizontal  and  vertical,  respectively. 


Qh(S)=  1 

all 


■d^\ 


4.24 


eF=  j 

all  ^ 


Hpost  i^Wevei^) 


eye' 


d^  =  3.2. 


4.25 


The  unit  of  Qt  is  Hertz  and  the  unit  for  Qv  and  Qh  is  milliradian‘\ 

Qt  is  not  explicitly  evaluated.  Variations  in  Qt  directly  affect  the  CTE  of  the  eye,  so  the 
effect  of  varying  Qt  is  subsumed  by  the  CTE  eye  factor  in  Equation  4.23.  Eye  integration 
time  varies  with  light  level.  This  is  a  naturally  occurring  process,  and  this  is  one  factor 
that  helps  to  establish  the  CTE  at  a  given  light  level.  The  resulting  variations  in  temporal 
bandwidth  affect  both  signal  and  noise,  and  the  impact  on  signal  to  noise  is  the  same 
whether  the  noise  is  external  or  internal.  As  a  result,  the  natural  CTE  variation  with  light 
level  adjusts  the  noise  term  in  Equation  4.23  in  the  correct  manner  without  further 
intervention.  This  means  that  the  product  y  Qt  can  be  treated  as  a  constant  which  we 
define  as  a  Hertz.  As  described  below,  the  value  of  a  is  169.6  root-Hertz.  Note  that  a  is 
not  the  temporal  bandwidth  of  the  eye.  Note  also  that  Equation  4.23  only  applies  to 
continuously  varying,  temporal  noise  such  as  occurs  with  framing  imagers.  Adapting  the 
theory  to  single  frame  (snapshot)  imagery  is  not  difficult;  see  Section  8.1.2. 

An  array  of  (frequency,CTEsys)  values  can  now  be  calculated  to  be  used  in  a  numerical 
integration  to  find  TTP.  Table  4.1  gives  values  for  CTEsys,  CTEeye,  Hgye,  Hgys,  Hpost,  and 
Qh  for  several  values  of  spatial  frequency. 


Table  4.1  Calculated  values  for  CTF  and  MTF  versus  spatial  frequency. 


frequency 

CTFsys 

CTFeye 

Heye 

Hsys 

Hpost 

QH 

0.5 

.0062 

5.9E-03 

0.90 

0.98 

0.99 

.67 

1 

.0039 

3.4E-03 

0.80 

0.93 

0.97 

.98 

1.5 

.0035 

2.8E-03 

0.72 

0.85 

0.92 

1.04 

2 

.0037 

2.7E-03 

0.64 

0.76 

0.87 

.97 

2.5 

.0044 

2.7E-03 

0.58 

0.65 

0.80 

.83 

3 

.0057 

2.9E-03 

0.52 

0.53 

0.73 

.68 
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3.5 

.0078 

3.2E-03 

0.46 

0.42 

0.65 

.54 

4 

.011 

3.6E-03 

0.41 

0.33 

0.57 

.42 

4.5 

.017 

4.1E-03 

0.37 

0.24 

0.49 

.32 

5 

.027 

4.6E-03 

0.33 

0.17 

0.42 

.25 

5.5 

.043 

5.1E-03 

0.30 

0.12 

0.35 

.19 

6 

.072 

5.8E-03 

0.27 

0.08 

0.28 

.14 

6.5 

.13 

6.5E-03 

0.24 

5.2E-02 

0.23 

.10 

7 

.23 

7.3E-03 

0.21 

3.2E-02 

0.18 

.08 

7.5 

.42 

8.2E-03 

0.19 

0.019 

0.14 

.06 

4.6  Model  Calibration 

The  calibration  factor  (a)  in  Equations  4.5  and  4.8  is  169.6  (root-Hertz).  This  value  does 
not  change  experiment  by  experiment  or  for  different  sensor  types.  This  value  is  constant 
regardless  of  the  system  or  environment  modeled.  The  value  of  a  was  obtained  from  an 
image  intensifier  (I  )  experiment  (Vollmerhausen,  1995).  During  the  experiment,  Air 
Force  3-bar  charts  were  viewed  through  image  intensifiers  to  determine  limiting 
resolution  versus  chart  illumination.  The  experiment  was  done  with  both  high  contrast 
(near  1.0)  and  moderate  contrast  (0.3)  charts.  This  was  an  excellent  experiment  for 
determining  a  for  several  reasons.  The  physical  characteristics  of  the  sensors  were 
accurately  measured.  The  measurements  were  made  at  illumination  levels  from  2.88  E-6 
foot  candles  to  3.39E-3  foot  candles.  This  variation  in  illumination  means  that  the  tubes 
were  operated  from  noise  limited  to  resolution  limited  conditions.  Measurements  were 
made  both  with  and  without  laser  eyewear  protection  that  reduced  the  light  to  the  eye  by 
a  factor  of  ten.  Also,  the  tubes  used  represented  both  typical  and  very  good  MTF,  and 
each  tube  was  operated  at  three  gain  levels  (25000,  50000,  and  75000).  Eight  to  the  eye 
varied  from  as  little  as  3.6E-4  foot  Eamberts  (fL)  to  as  much  as  12.4  fL.  This  was  an 
excellent  data  set  because  of  the  controlled  nature  of  the  physical  sensor  data,  the  wide 
range  of  scene  illuminations,  and  the  large  variation  of  light  to  the  eye. 

Three  experienced,  dark-adapted  observers  determined  limiting  resolution  using  Air 
Force  3-bar  charts.  Charts  with  contrast  of  1.0  and  0.4  were  used.  With  the  1.0  contrast 
chart,  data  were  taken  with  and  without  eyewear  protection  for  three  tubes,  three  gains, 
and  five  illumination  levels.  Data  were  taken  for  three  tubes,  one  gain,  and  five 
illumination  levels  with  the  0.4  contrast  chart  and  no  eyewear.  A  modified  version  of 
Equation  10  was  used  to  predict  limiting  frequency  visible  for  each  illumination,  tube, 
tube  gain,  eyewear,  and  chart  contrast  condition.  The  model  modification  involved 
correcting  the  theory  to  predict  for  3 -bars  versus  the  continuous  sinewaves  assumed  when 
measuring  CTF.^^  A  value  for  a  of  169.6  root-Hertz  provided  the  best  fit  based  on 
average  error  between  model  and  data. 

Figure  4.9  compares  the  laboratory  data  to  model  results  for  all  105  data  points.  The 
abscissa  plots  the  observed  bar  resolution  and  the  ordinate  is  model  resolution 
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predictions.  If  the  model  were  perfect  (and  if  the  signal  to  noise,  gain,  and  MTF 
measurements  of  the  tube  and  optics  were  perfect),  then  all  the  points  in  Figure  3.11 
would  lie  on  the  straight  line.  The  model  predictions  are  excellent;  the  square  of  the 
Pearson  coefficient  is  0.98  and  the  RMS  error  is  0.057. 


Figure  4.9  Plot  showing 
experimental  data  versus  model 
predictions.  Perfect  predictions 
would  lie  on  the  diagonal  line. 
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5 


Definition  of  Target  Acquisition  Tasks 

This  section  defines  target  acquisition  tasks  and  discusses  model  assumptions  about 
accomplishing  those  tasks.  A  model  is  an  algorithm  or  group  of  inter-related  equations 
based  on  a  set  of  assumptions.  Mathematical  models  are  rigid  in  their  application;  they 
apply  only  where  circumstances  match  the  assumptions.  This  is  certainly  true  for  the 
target  acquisition  model. 

Two  topics  are  discussed  in  this  section.  First,  the  basic  targeting  tasks  are  defined;  these 
tasks  include  target  detection,  recognition,  and  identification  (ID).  Next,  the  meaning  of 
the  probabilities  predicted  by  the  model  is  described. 

The  meaning  of  target  detection  varies  with  operational  circumstance.  Sometimes  a 
target  is  detected  because  it  is  in  a  likely  place;  sometimes  a  target  is  detected  because  it 
looks  like  a  target  (target  recognition).  Target  detection  is  many  things,  and  therefore  not 
easy  to  model.  Some  analysts  associate  a  degree  of  certainty  with  target  detection;  to 
them,  detection  means  the  object  is  of  military  interest.  This  is  not  consistent  with  current 
war  game  modeling;  hopefully  the  war  games  reflect  operational  practice. 

Search  is  a  process,  not  a  single  event,  and  finding  the  target  generally  occurs  only  after  a 
series  of  false  alarms.  The  observer  searches  with  the  imager  in  a  wide  field  of  view; 
when  an  interesting  place  or  object  is  seen,  potentially  a  target,  he  switches  to  a  narrower 
field  of  view  for  a  closer  examination.  In  our  search  experiments,  with  a  high  density  of 
targets  and  a  good  imager,  the  field  of  view  is  typically  switched  three  times  before  a 
target  is  found.  With  a  poorer  sensor  or  a  lower  density  of  targets,  the  field  of  view  is 
switched  many  times.  When  a  target  is  finally  confirmed  in  the  narrow  field  of  view,  it  is 
credited  as  a  detection  in  the  wide  field  of  view.  When  analyses  are  performed  to 
determine  the  resolution  requirements  needed  to  detect  the  target,  the  characteristics  of 
the  wide  field  of  view  are  used.  The  result  is  that,  experimentally,  very  few  “cycles  on 
target”  are  needed  for  detection.  It  must  be  remembered,  however,  that  the  low  cycle 
criteria  are  associated  with  a  high  false  alarm  rate. 

Further,  it  often  occurs  that  the  target  and  sensor  together  play  a  minor  role  in 
determining  the  probability  of  detection.  On  the  left  in  Figure  5.1,  the  target  is  easily 
found.  This  is  a  thermal  image,  and  the  target  is  much  hotter  than  anything  else  in  the 
scene.  On  the  right  in  that  figure,  the  same  target  is  in  the  same  location  with  the  same 
target  to  background  contrast;  only  the  background  objects  have  changed.  The  target  is 
hard  to  find  because  of  clutter.  Clutter  can  affect  target  acquisition  range  by  a  factor  of 
four.  Very  few  sensor  design  parameters  have  that  much  influence  on  range  performance. 
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Search  and  detection  are  important  sensor  functions,  and  the  war-game  community  pays  a 
great  deal  of  attention  to  modeling  search.  However,  search  modeling  is  very  complex 
and  involves  many  factors  beyond  sensor  performance.  These  factors  will  not  be 
discussed  further. 


Figure  5.1  On  left,  hot  target  in  uncluttered  background  viewed  with  thermal 
imager.  On  right,  same  target  but  background  has  become  much  hotter. 


Recognition  involves  discriminating  which  class  of  vehicle  the  target  belongs  in.  In 
Figure  5.2,  there  are  two  trucks,  two  Armored  Personnel  Carriers  (APC)  and  two  tanks. 
In  a  recognition  experiment,  the  observers  (subjects)  are  trained  and  tested  on  the  specific 
target  set.  These  trained  observers  are  shown  the  targets  at  range  (so  the  images  are 
blurred,  noisy,  perhaps  poorly  sampled)  and  asked  to  specify  tank,  truck,  or  APC.  If  the 
observer  gets  the  class  correct,  the  task  is  scored  as  correct.  That  is,  the  observer  might 
mistake  the  T72  tank  in  Figure  5.2  for  the  Sheridan  tank.  He  has  correctly  “recognized” 
that  the  target  is  a  tank.  It  does  not  matter  that  he  incorrectly  identified  the  vehicle. 


APC  Truck  Tank 


Figure  5.2  Side  views  of  a  group  of  vehicles  that  might  be  used  in  a  recognition  test. 

Note  two  important  things  about  recognition.  First,  the  difficulty  of  recognizing  a  vehicle 
depends  on  the  vehicle  itself  and  on  the  alternatives  or  confusers.  Discriminations  are 
always  comparisons.  Task  difficulty  is  established  by  the  set,  not  by  an  individual 
member  of  the  set.  Second,  a  recognition  set  of  targets  like  the  one  shown  in  Figure  5.2 
involves  easy  discriminations  and  more  difficult  discriminations.  APCs  look  much  more 
like  tanks  than  either  tanks  or  APCs  look  like  trucks.  So  the  typical  recognition  task  is 
actually  a  combination  of  easy  discriminations  and  more  difficult  discriminations  with 
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the  results  averaged.  In  terms  of  range  performanee,  the  tasks  should  be  modeled 
separately. 

Target  identification  requires  the  observer  to  make  the  eorrect  vehicle  choice.  A  set  of 
targets  which  might  be  used  in  an  ID  experiment  is  shown  on  the  left  in  Figure  5.3.  Only 
one  aspect  of  each  vehicle  is  shown;  experiments  use  several  aspects  of  each  vehicle. 
Twelve  aspects  of  the  T62  Russian  tank  are  shown  to  the  right  in  Figure  5.3.  Again,  the 
observers  are  well  trained  and  tested  that  they  can  correctly  identify  each  individual 
vehicle.  The  targets  are  put  at  range  (blurred,  noisy,  perhaps  corrupted  by  poor  sampling) 
and  the  observer  must  indicate  which  target  he  is  shown.  In  this  case,  the  observer  must 
correctly  identify  the  target,  not  just  the  class.  Calling  a  T72  tank  a  T62  tank  is  scored  as 
an  incorrect  choice. 


Figure  5,3  Target  set  for  ID  experiments. 


The  difficulty  of  the  ID  task  depends  on  the  group  of  targets  selected,  not  the  individual 
target  which  happens  to  be  within  the  sensor  FOV.  The  model  does  not  predict  the 
probability  of  identifying  or  recognizing  individual  vehicles;  the  model  predicts  the 
average  probability  of  correctly  identifying  all  members  of  the  group  at  range.  The 
difficulty  of  the  task  depends  on  how  much  the  members  of  the  group  look  alike.  In 
Figure  5.4,  three  observers  are  trying  to  identify  three  vehicles.  If  the  first  observer  gets 
all  three  vehicles  correct,  the  second  observer  gets  two  correct,  and  the  third  observer  gets 
one  correct,  then  the  probability  of  correct  ID  is  0.67.  That  is,  six  total  correct  calls 
divided  by  nine  total  calls.  This  average  over  both  observers  and  targets  in  the  group  is 
what  the  model  predicts. 


Observers  Target  vehicles 


Figure  5,4  Illustration  of  How  Probabilities  are  Calculated 
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To  achieve  prediction  accuracy,  the  model  requires  a  group  of  observers  (ten  to  twenty) 
and  a  group  of  “like”  targets.  The  group  of  vehicles  in  Figure  5.3  are  sufficiently  alike 
that  model  aecuracy  is  good  (less  than  0.05  average  error  in  the  predieted  probabilities, 
with  the  biggest  errors  oecurring  at  the  0.5  point  in  the  curve  where  statistical  variability 
is  expected). 

A  target  aequisition  diserimination  is  always  a  eomparison.  Is  it  target  A  or  target  B  or 
target  C?  Is  it  a  target  or  baekground?  It  is  quite  eommon  for  an  analyst  to  be  asked  the 
question:  “Using  this  sensor,  at  what  range  ean  I  identify  a  T72  Russian  Tank?”  That 
question  cannot  be  answered;  it  is  only  partly  formulated.  The  examples  below  might 
elarify  this  statement. 

The  Iraqi’s  used  T72  Tanks  in  the  1991  war.  One  U.S.  ally  in  that  war  was  Egypt; 
beeause  of  the  vagaries  of  the  cold  war  era,  Egypt  owns  both  T62  Russian  Tanks  and 
U.S.  built  M60  Tanks.  The  three  tanks  are  shown  in  Eigure  5.5.  Beeause  Russian  tanks 
tend  to  look  alike,  if  our  ally  used  a  T62  Tank,  the  friend-versus-foe  decision  would  be 
more  difficult  than  if  our  ally  used  an  M60  Tank.  So  the  range  at  which  a  T72  can  be 
reliably  identified  depends  on  the  alternative. 


Figure  5.5  Images  of  three  tanks 
illustrating  that  the  probability  of  correct 
ID  depends  on  the  alternatives  presented. 


ID  experiments  have  been  performed  using  the  target  set  shown  in  Eigure  5.3.  Probability 
of  ID  versus  range  is  shown  in  Eigure  5.6  by  the  curve  labeled  “full  target  set.”  The  curve 
labeled  “partial  target  set”  shows  the  results  of  an  ID  experiment  using  nine  of  the  twelve 
targets;  the  M109,  T62,  and  T55  have  been  removed  from  the  target  set.  The  T62  and 
T55  look  like  the  T72;  the  M109  looks  a  lot  like  the  2S3.  Removing  these  vehicles  makes 
identifying  the  remaining  targets  easier,  and  this  results  in  a  higher  probability  of  ID. 

1 

Q 

Figure  5.6  Probability  of  ID  versus  ~  0.8 
range  for  the  full  set  of  target  shown  ^ 

in  Figure  5.3  and  for  an  easier  to 
identify,  partial  set  where  the  T62,  ^  0.4 

T55  and  MI  09  vehicles  are  not  used.  o 

£  0.2 

0 

0  5  10 

Range  (Kilometers) 
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Target  detection,  recognition,  or  identification  is  determined  by  a  process  of  seeing 
viewpoint-invariant  details.  The  size,  contrast,  and  number  of  characteristic  details  visible 
to  the  observer  determines  the  probability  of  target  acquisition.  Our  model  predicts  the 
quality  of  the  image  and  therefore  the  ability  of  the  observer  to  acquire  the  target. 
However,  targets  are  acquired  by  differentiating  them  from  the  possible  alternatives.  This 
means  that  the  features  which  uniquely  define  a  target  are  those  which  differentiate  that 
target  from  other  targets  or  from  background.  Therefore,  task  difficulty  depends  on  how 
alike  the  targets  look  or  the  level  of  target-like  clutter  in  the  background. 
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PREDICTING  TARGET  ACQUISITION  PERFORMANCE 
FROM  IMAGER  CTF 


Both  the  Johnson  criteria  and  the  Target  Task  Performance  (TTP)  metric  are  MTF -based 
metrics.  These  metrics  share  the  concept  that  image  quality  can  be  quantified  by  a 
weighted  integral  over  spatial  frequency  of  the  ratio  between  signal  and  CTF.  It  is 
assumed  that  the  excess  modulation  over  threshold  provides  the  information  acted  upon 
by  the  visual  system.  A  great  virtue  of  MTF-based  metrics  is  the  simplicity  of 
implementing  a  range  performance  model;  for  a  specific  task,  it  is  assumed  that  range  is 
proportional  to  the  metric  value. 


The  Johnson  criteria  uses  the  limiting  frequency  visible  at  the  average  target  contrast  to 
quantify  image  quality  and  therefore  range  performance.  The  Johnson  metric  is  defined 
by  the  spatial  frequency  range  {Fj)  over  which  the  apparent  target  contrast  (Ctgt) 
exceeds  the  system  contrast  threshold  [CTFsys(^)].  See  Figure  6.1  for  an  illustration  of  the 
Johnson  metric. 


Figure  6.1  Johnson  criteria  uses 
intersect  of  target  apparent  contrast  .b 
and  CTFsys  as  measure  of  image  o 
quality  for  targeting  purposes.  In  ^ 
this  figure,  the  intersection  occurs  at 
frequency  Fj. 


Spatial  frequency 


The  TTP  metric  gives  weight  to  the  amount  that  threshold  is  exceeded  at  each  spatial 
frequency;  this  makes  the  TTP  metric  sensitive  to  image  qualities  not  quantified  by  the 
Johnson  methodology.  The  TTP  metric  is  calculated  as  shown  in  Equation  6.1.  In  this 
equation,  ^cut  is  the  high  spatial  frequency  where  CTFsys  exceeds  Ctgt;  ^cut  equals  Fj.  ^low 
is  the  spatial  frequency  below  which  CTFsys  exceeds  Ctgt-  Lateral  inhibition  in  the  eye 
results  in  CTFsys  having  a  spatial  bandpass  response;  the  eye  sees  intermediate  spatial 
frequencies  better  than  either  very  low  or  high  frequencies.  However,  ^low  is  very  nearly 
zero.  Because  of  the  square  root,  contrast  that  is  well  in  excess  of  threshold  is  not  as 
important  as  contrast  that  just  exceeds  threshold.  The  TTP  value  calculated  using 
Equation  6.1  is  used  in  lieu  of  Ej  to  quantify  image  quality  and  predict  range 
performance. 
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TTP  = 


^cut 

I 

^low 


Ctgt 

CTF,yA^) 


(6.1) 


While  the  Johnson  criteria  provides  reasonable  performance  estimates  in  many 
circumstances,  applying  that  criteria  to  sampled  imagers  generally  results  in  pessimistic 
predictions.  In  recent  years,  modelers  have  developed  “work  arounds”  to  use  the  Johnson 
criteria  with  sampled  imagers  (Driggers,  2000;  Wittenstein,  1999;  Bijl,  1998).  These 
fixes  have  limited  application,  however,  because  they  are  empirical  adjustments  of  a 
basically  flawed  model.  The  Johnson  criteria  “work  arounds”  do  not  permit  the  modeling 
of  digital  image  enhancement,  for  example,  because  variations  in  CTFsys  below  the  cutoff 
frequency  do  not  affect  the  metric  value.  The  TTP  metric  does  an  excellent  job  of 
predicting  the  performance  of  both  well-sampled  and  under-sampled  imagers.  It  also 
predicts  the  performance  impact  of  frequency  boost,  colored  noise,  and  other 
characteristic  features  found  in  modem  imagers.  A  summary  of  some  of  the  experimental 
data  supporting  the  TTP  metric  and  illustrating  the  problems  with  the  Johnson  criteria  is 
provided  in  Appendices  A,  B,  and  C. 


6.1  Predicting  Probability  versus  Range 

A  range  performance  model  is  created  by  assuming  that  target  acquisition  range  is 
proportional  to  the  image  quality  metric.  That  is,  the  range  at  which  a  task  can  be 
performed  is  proportional  to  the  TTP  value  calculated  in  Equation  6.2.  For  a  given  target 
contrast  and  size,  a  given  task  like  target  ID,  and  a  selected  probability  of  accomplishing 
the  task,  the  range  is  calculated  as  shown  in  Equation  6.2. 


Range = 


^|^TGT  TTP 

^required 


(6.2) 


For  tactical  vehicle  targets,  size  is  usually  taken  as  the  square  root  of  the  viewed  target 
area  (Atgt)-  Nrequired  represents  task  difficulty  and  desired  probability  of  success;  the 
value  of  Nrequired  is  established  experimentally  for  a  particular  target  set  and  task.  For 
vehicle  images,  the  zero  range  target  to  background  contrast  is  defined  by: 

(A//)^  +(j}gt 

-  (6.3) 

^  R scene 


CtGT-0  ^ 


where  p-scene  is  the  average  scene  luminance  (or  temperature)  in  the  vicinity  of  the  target, 
Ap  is  the  difference  in  average  luminance  (temperature)  between  the  target  and  local 
background,  and  atgt  is  the  standard  deviation  of  the  target  luminance  (temperature). 

While  range  is  proportional  to  image  quality,  the  probability  of  accomplishing  a  task  is 
not.  To  calculate  probability  with  the  target  at  a  given  range,  first  use  Beer’s  law  or 
MODTRAN  to  calculate  the  atmospheric  transmission  (r),  then  calculate  the  apparent 
target  contrast  at  the  sensor  (Ctgt). 

Ctgt  =  ^  (6.4) 
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Ctgt  is  found  using  Equation  6.4,  the  TIP  value  is  calculated  using  Equation  6.1,  and 
then  resolved  cycles  is  calculated  using  Equation  6.5. 


^resolved  ~ 


^AtgtTTP 

Range 


(6.5) 


An  empirically  derived  Target  Transfer  Probability  Eunction  (TTPE)  is  used  to  relate 
probability  of  task  performance  to  the  ratio  of  Nresoived  to  V50,  where  V50  is  the  metric 
value  needed  to  accomplish  the  task  with  a  0.5  probability.  Again,  V50  is  established 
experimentally.  The  TTPE  curve  is  a  logistics  function  as  defined  by  Equation  6.6. 


P  = 


^resolved 


50) 


1  + 


^resolved 


50) 


(6.6) 


where 

E  =  l.5l  +02A^  resolved^ (6.7) 

The  process  is  repeated  at  range  intervals  to  generate  a  probability  versus  range  function 
as  shown  in  Eigure  6.2.  If  the  goal  is  to  predict  the  outcome  of  a  field  experiment,  then 
the  probabilities  generated  with  Equation  6.6  are  corrected  to  add  chance  and  to  add  the 
0.1  probability  associated  with  observer  mistakes;  the  probability  corrections  are 
described  in  Section  6.2. 


Figure  6.2  Typical  model 
output  is  probability  versus 
range  as  shown  in  the  figure. 


1 


Target  area  in  Equation  6.5  and  target  contrast  in  Equation  6.1  refer  to  averages  over  the 
group  of  targets  involved  in  the  experiment  or  scenario.  The  reasoning  behind  this  is 
discussed  in  Section  6.3. 


Many  imagers  have  different  resolution  characteristics  in  the  horizontal  and  vertical 
dimensions.  In  scanning  thermal  imagers,  for  example,  the  horizontal  resolution  is  often 
much  better  than  the  vertical  resolution.  As  discussed  in  Section  4,  CTFsys  is  calculated 
for  the  two  dimensions  using  Equations  4.5  and  4.8.  Then  the  TTP  metric  is  calculated 
separately  for  each  direction. 


TTPH  ^ 


^cut 

\ 

^low 


Ctgt 


nl/2 


CTFH,y,{^) 


(6.8) 
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TTPV  = 


Vcut 

\ 

Vlow 


CtGT 

CTFV,y,{t,) 


drj 


(6.9) 


The  TIP  value  to  use  in  Equation  6.5  to  find  Nresoived  at  eaeh  range  is  then  the  geometric 
mean  of  the  horizontal  and  vertical  TTP  values. 


TTP  =  ^TTPH  TTPV 

This  mean  value  of  TTP  is  used  in  Equation  6.6  to  find  probability  of  target  acquisition. 


6.2  Meaning  of  Model  Probabilities 

The  probabilities  predicted  by  the  model  are  intended  to  be  used  to  assess  sensor 
“goodness”  for  target  acquisition.  Model  probabilities  have  been  adjusted  to  remove  the 
infiuence  of  factors  which  affect  target  acquisition  probability  but  which  are  independent 
of  sensor  design.  The  model  probabilities  have  been  corrected  for  chance  and  corrected 
for  non-ideal  observer  performance.  The  relationship  between  model  probabilities  and 
observed  data  is  explained  in  this  section. 

If  an  ID  experiment  is  conducted  using  four  vehicles,  then  there  is  a  0.25  probability  of 
correct  ID  just  by  chance.  As  range  increases,  the  measured  probability  drops  to  0.25,  not 
to  zero.  If  twelve  targets  are  used  in  the  experiment,  then  the  probability  drops  to  0.083  at 
long  range.  If  a  recognition  experiment  is  performed  using  three  classes  of  targets  (tank, 
truck,  and  APC),  then  the  probability  of  getting  the  answer  correct  just  by  chance  is  0.33. 
In  a  wheeled-versus-tracked  classification  experiment,  probability  of  correct  choice  by 
chance  is  0.5  because  there  are  only  two  choices. 

The  probability  of  chance  is  removed  before  using  experimental  data  to  calibrate  the 
model. 

Model  Probability  =  Measured  Probabilily  -  Pehance  (^1,) 

1 ■  ^chance 

Where  Pehance  is  the  probability  of  correctly  identifying  the  target  or  target  class  just  by 
chance.  If  four  targets  or  target  classes  are  used  in  the  experiment,  then  Pehance  is  0.25. 
If  twelve  targets  or  target  classes  are  used,  then  Pehance  is  0.083.  To  compare  model 
predictions  with  field  data,  the  above  formula  is  inverted. 

Predicted  Measured  Probability  =  Model  Probability  (1-Pcl^ance)  +  Pchance  (6-12) 

Another  correction  is  made  to  experimental  data  before  comparing  it  to  model 
predictions.  Even  well  trained,  conscientious  people  make  mistakes.  We  observe  a  0.1 
error  rate  which  cannot  be  correlated  to  image  quality,  training,  or  apparent  motivation. 
This  error  rate  is  fairly  consistent  across  the  various  target  acquisition  tasks  (search, 
recognition,  identification).  Some  observers  do  achieve  1.0  probabilities  on  clean  target 
image  sets,  but  when  an  average  over  twenty  observers  is  made,  the  top  probability  is  0.9. 
Our  data  asymptotes  to  0.9  probability  at  close  range. 
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Whether  this  error  rate  is  observed  under  field  eonditions  is  not  known  by  the  authors. 
Whether  that  error  rate  should  be  represented  in  the  model  is  a  matter  of  judgment. 
Traditionally,  however,  this  drop  in  probability  due  to  mistakes  has  not  been  ineluded  in 
performanee  models. 

If  it  is  desired  to  inelude  the  base  mistake  rate  for  an  ensemble  of  observers,  then  use 
Equation  6.13  rather  than  Equation  6.12  to  relate  field  measured  probabilities  to  model 
probabilities. 

Predieted  Measured  Probability  =  Model  Probability  (0.9 -Pchanee)  +  Pehanee  (6.13) 

6.3  Field  Test  Example 

In  a  hypothetieal  field  test,  eight  taetieal  vehieles  are  available:  Ml,  BMP,  T72,  Ml 09, 
Ml  13,  M2,  2  V2  ton  truek,  and  a  HMMWV.  Sinee  six  are  traeked  vehieles,  one  is  a  truek, 
and  the  other  a  HMMWV,  the  deeision  is  made  to  drop  the  truek  and  HMMWV  as  being 
too  dissimilar  from  the  rest  of  the  vehieles.  The  average  dimension  (square  root  of  area) 
and  average  eontrast  for  the  six  traeked  vehieles  are  3  meters  and  4“  C,  respeetively.  A 
V50  of  20  for  identifying  this  partieular  group  of  vehieles  is  established  by  experienee 
and  expert  judgment. 

The  model  is  run  to  prediet  probability  versus  range  for  the  sensor  system  being 
evaluated.  Eive  ranges  are  seleeted  whieh  span  ID  probabilities  from  high  to  very  low. 
Beeause  of  the  vagaries  introdueed  by  mistakes,  ehanee,  and  the  many  faetors  whieh  bias 
real  field  data,  a  system  should  not  be  evaluated  using  only  elose  range,  high  probability 
data. 

At  eaeh  range,  three  aspeets  of  all  targets  are  presented.  If  the  vehiele  has  a  rear  mounted 
engine,  then  the  aspeets  are  front,  side,  and  opposite  side -rear  oblique.  If  the  vehiele  has  a 
front  mounted  engine,  then  the  aspeets  are  rear,  side,  and  opposite  side-front  oblique.  The 
total  test  eonsists  of  18  target  views  at  five  ranges  for  a  total  of  90  images. 

Images  are  eolleeted  for  viewing  in  the  lab,  or  observers  are  taken  to  the  field.  Certainly, 
data  taking  is  simplified  if  the  observers  are  not  in  the  field.  The  observer  must  be 
deprived  of  any  elues  other  than  the  sensor  imagery  whieh  might  help  him  identify  the 
target.  The  observer’s  situational  awareness  is  best  limited  by  separating  him  from  the 
test  site.  However,  the  output  of  some  sensors  is  not  easily  reeorded  for  later  display,  and 
the  experiment  is  best  performed  in  the  field. 

The  observers  are  trained  to  ID  the  vehieles  used  in  the  experiment.  The  observers  must 
pass  a  test  to  prove  they  ean  eorreetly  ID  all  the  vehieles  before  partieipating  in  the 
experiment.  The  observers  are  asked  to  ID  the  targets  from  the  sensor  imagery.  The 
average  eorreet  ID  probability  for  eaeh  range  is  ealeulated  based  on  observer  responses. 
The  total  experiment  yields  five  (5)  data  points.  To  eompare  the  model  probabilities  to 
the  aetual  data  eolleeted  in  the  field,  model  probabilities  are  adjusted  using  Equation  6.12 
or  6.13  above  with  0.167  substituted  for  Pchance-  The  ehoiee  of  whieh  equation  to  use 
depends  on  the  number,  experienee,  and  motivation  of  the  observers.  Use  Equation  6.12 
if  the  experiment  involved  a  few,  highly  experieneed  observers.  Otherwise,  use  Equation 
6.13. 
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If  the  above  steps  are  followed,  the  model  accurately  predicts  observer  performance. 


6.4  Estimating  Task  Difficulty  (V50) 

There  is  currently  no  objective  way  to  establish  V50  for  a  target  group  other  than  by 
experiment.  We  have  found,  however,  that  a  careful  process  of  comparative  judgment  can 
provide  good  estimates  for  V50.  That  is,  knowing  the  experimentally  established  value  of 
V50  for  example  target  groupings,  comparative  judgment  can  be  used  to  estimate  the  V50 
for  a  related  group  of  targets. 

This  sections  provides  some  example  target  sets  and  the  associated  V50  values.  Examples 
are  given  for  detection,  recognition,  and  ID.  Since  V50  values  are  based  on  experience, 
historical  data  should  also  be  useful  in  establishing  values  to  be  used  in  the  new  model. 
However,  there  are  several  issues  to  consider  when  making  comparisons  between  new 
V50  and  old  N50  values.  In  addition  to  giving  V50  examples,  this  section  discusses  the 
differences  between  historical  values  of  N50  used  with  the  Johnson  criteria  and  values  of 
V50  used  with  the  new  TTP  metric. 

The  Johnson  metric  can  be  thought  of  as  an  integral  over  spatial  frequency. 

Fj=iWi; 

0 


where  Fj  is  the  frequency  where  Ctgt  equals  CTFsys;  see  Figure  6.1.  The  “1”  in  the 
integral  is  to  emphasize  that  each  frequency  increment  counts  equally;  if  the  target 
apparent  contrast  exceeds  the  threshold  needed  for  visibility  at  a  particular  frequency, 
then  that  frequency  increment  is  counted  in  the  Johnson  bandwidth. 

The  TTP  metric  value  is  also  an  integral  over  essentially  the  same  frequency  range.  The 
value  of  ^low  in  Equation  6.1  is  always  small;  to  a  good  approximation,  ^low  is  zero. 
Remembering  that  ^cut  equals  Fj: 


Fj 

TTP=  j 


0 


(6.15) 


The  ratio  CiGi/CTEsys  is  always  greater  than  one.  This  means  that  the  value  of  TTP  is 
always  greater  than  Fj.  The  ratio  between  the  Johnson  metric  and  the  TTP  metric  is  not 
fixed;  if  the  ratio  were  fixed,  then  the  two  metrics  would  provide  identical  performance 
predictions.  However,  for  those  cases  where  both  metrics  predict  performance  well,  the 
ratio  of  TTP  value  to  Johnson  metric  value  is  approximately  2.7:1. 

This  does  not  mean  that  the  Johnson  N50  values  can  be  multiplied  by  2.7  to  obtain  an 
V50  for  the  new  model.  In  the  new  model,  V50  represents  the  resolved  cycles  needed  to 
achieve  a  0.5  probability  independent  of  chance.  Historically,  the  data  used  to  establish 
N50  were  not  corrected  for  chance. 

It  is  not  clear  how  N50  values  for  two-class  discriminations  were  established.  Although 
the  historical  value  of  N50  for  discriminating  wheeled  vehicles  from  tracked  is  1  to  2 
cycles,  a  0.5  probability  of  success  is  actually  achievable  with  zero  cycles.  Since  there  are 
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only  two  classes,  success  half  of  the  time  is  guaranteed.  For  3-class  recognition  (tank- 
truck-APC),  most  of  the  0.5  probability  is  attributable  to  the  0.33  probability  of  being 
correct  just  by  chance.  As  the  number  of  choices  increases,  the  impact  of  using 
uncorrected  data  to  establish  N50  decreases. 

Table  6.1  shows  how  an  N50  based  on  uncorrected  data  must  be  increased  to  be  used  in  a 
model  which  does  remove  probability  due  to  chance.  This  table  is  based  on  the  TTPF 
associated  with  the  Johnson  metric.  The  multiplier  values  are  the  ratio  of  N50  needed  to 
achieve  0.5  probability  without  chance  to  the  N50  needed  when  chance  is  included.  For 
example,  if  a  3-choice  recognition  experiment  (tank-truck-APC)  yields  an  N50  of  3  based 
on  uncorrected  data,  then  the  N50  for  corrected  data  would  be  3  *  1.79  or  5.37.  It  is  easier 
to  achieve  0.5  probability  when  chance  is  included,  so  the  N50  for  uncorrected  data  is 
smaller  than  the  N50  for  corrected  data.  As  the  number  of  choices  increases,  the  impact 
of  chance  on  the  data  decreases,  and  the  ratio  of  the  N50  values  approaches  one. 


Table  6.1 


Number  of  choices 

3 

4 

5 

6 

8 

10 

12 

20 

N50  Multiplier 

1.79 

1.43 

1.3 

1.23 

1.16 

1.12 

1.1 

1.05 

Two  examples  will  illustrate  how  V50  values  for  the  new  model  can  be  derived  from  N50 
values  used  with  the  Johnson  model.  With  the  Johnson  metric,  tank-truck-APC 
recognition  is  modeled  using  an  N50  of  3.  The  N50  for  data  with  chance  removed  is  5.37. 
Multiplying  by  2.7,  the  value  of  14.5  is  the  V50  for  use  in  the  new  model. 

Although  details  are  not  available  on  how  the  “standard”  N50  of  6  for  ID  was  established, 
assume  that  a  6-choice  experiment  was  used.  A  equivalent  V50  for  the  new  model  is 
found  by  multiplying  6  by  1.23  and  then  by  2.7  yielding  19.9.  Table  6.2  gives  Johnson 
N50  and  TTP  V50  values  for  a  selection  of  target  acquisition  tasks. 


Table  6.2 


Task  description 

N50 

w/ chance 

TTP  V50 
w  chance 

TTP  V50 
w/o  chance 

Low  clutter  thermal  detect;  Figure  6.3 

0.75 

2 

2 

Medium  clutter  thermal  detect;  Figure  6.4 

1.7 

4.6 

4.6 

Recognize  tank-truck-APC;  Figure  6.5 

3 

8.1 

14.5 

Recognize  truck/wheeled- 
armored/tracked-armored 

Figure  6.6  (Reference  Devitt,  2001) 

3.5 

9.45 

16.9 

ID  12-target  set;  Figure  6.7 

7.8 

21.2 

23.3 

ID  9-target  set;  Figure  6.8 

6.5 

17.6 

20.0 

52 


Figure  6.3  Example  of  Low 
Clutter,  Thermal  Detect 


Figure  6.4  Example  of  Moderate 
Clutter,  Thermal  Detect 


Figure  6.5  Recognition  Tank- 
Truck- APC  Several  aspects  of 
each  vehicle  would  be  used  in  a 
recognition  experiment. 


Figure  6.6  Recognition  Tracked-armored/Wheeled-armored/Soft-truck 
Experiment  involved  many  vehicles  and  aspects;  these  are  examples. 
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Figure  6.7  Twelve  Tracked  Military  Vehicles 


Figure  6.8  Nine  Tracked  Military  Vehicles  Three  of  the  vehicles 
in  Figure  6.7  have  been  removed;  since  those  vehicles  look  like 
some  of  the  remaining  vehicles,  this  target  set  is  easier  to  ID. 
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Modeling  Sampled  Imagers 

The  sampling  limitations  associated  with  focal  plane  array  (FPA)  imagers  cause  an 
aliased  signal  that  corrupts  the  image.  Aliasing  can  cause  distortion  of  scene  detail;  for 
example,  fence  posts  can  be  fatter,  thinner,  or  disappear  completely.  Aliasing  can  also 
cause  display  artifacts  like  line  raster.  The  aliased  signal  is  a  function  of  input  image,  pre¬ 
sample  blur,  sampling  frequency,  and  image  reconstruction  at  the  display.  The  model 
used  to  predict  the  amount  of  sampling  artifacts  present  in  an  imager  is  described  by 
Vollmerhausen  (2000,  04/2000). 

Aliasing  can  degrade  target  acquisition  performance.  Experiments  to  calibrate  the 
decrease  in  performance  based  on  the  aliased  signal  present  are  described  in  several 
references  (Vollmerhausen,  1999;  Krapels,  1999,  Krapels,  2001;  Devitt,  1999).  The 
technique  for  predicting  sampling  artifacts  and  the  resulting  degradation  in  range 
performance  is  summarized  here.  Examples  showing  the  predictive  accuracy  of  the 
technique  are  described  in  Appendices  A  and  C. 

It  has  become  common  practice  among  engineers  to  use  the  term  aliasing  to  refer  only  to 
spurious  frequency  content  that  overlaps  and  corrupts  the  signal  in  the  original  (pre¬ 
sampled)  frequency  band.  Sampling  actually  causes  aliasing  at  all  spatial  frequencies. 
However,  to  avoid  confusion  about  the  meaning  of  aliasing,  the  term  spurious  response  is 
used  in  this  paper.  The  part  of  the  image  spectrum  which  results  from  sampling,  other 
than  the  original  frequency  content,  is  referred  to  as  spurious  response.  That  is,  in 
frequency  space,  spurious  response  is  the  Eourier  transform  of  the  sampling  artifacts. 

The  spurious  response  of  a  sensor  corresponds  to  artifacts  in  the  sensor  imagery;  it  is  a 
much  better  indicator  of  sampling  efficacy  than  the  half  sample  rate.  The  spurious 
response  of  a  sensor  can  be  described  in  a  manner  very  similar  to  the  sensor  Modulation 
Transfer  Eunction  (MTE)  in  that,  the  frequency  components  of  the  spurious  response  may 
be  plotted  similar  to  an  MTE.  The  greatest  barrier  in  the  use  of  spurious  response  to 
characterize  sensor  performance  is  the  calibration  of  human  reaction  to  spurious 
response. 

The  amount  of  spurious  response  in  an  image  is  dependent  on  the  spatial  frequencies  that 
comprise  the  scene  and  on  the  blur  and  sampling  characteristics  of  the  sensor.  However, 
the  spurious  response  capacity  of  an  imager  can  be  determined  by  characterizing  the 
imager  response  to  a  point  source.  This  characterization  is  identical  to  the  MTE  approach 
for  continuous  systems.  MTE  is  a  trusted  indicator  of  optical  quality.  But  the  need  for 
good  MTE  cannot  be  established  until  the  scenario  and  task  are  defined.  Good  MTE  is  not 
always  needed;  it  is  prized  because  of  the  potential  is  provides.  The  same  is  true  for  the 
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spurious  response  charaeteristics  of  an  imager.  The  aetual  amount  of  aliasing  cannot  be 
known  without  specifying  the  scene,  but  the  tendency  of  an  imager  to  generate  sampling 
artifacts  is  significant  in  the  same  sense  that  good  MTF  is  significant. 

The  effect  of  sampling  on  target  acquisition  is  modeled  with  the  following  procedure. 
First,  the  spurious  response  of  the  imager  is  analyzed;  this  is  done  by  characterizing  the 
shift-variant  response  of  the  imager  to  a  point  source.  Once  the  amount  and  nature  of  the 
spurious  response  is  known,  experience  from  target  acquisition  experiments  with  sampled 
imagery  is  used  to  establish  the  expected  drop  in  performance. 


7.1  Response  Function  of  a  Sampled  Imager 

The  response  function  Rsp(^)  for  a  sampled  imager  is  found  by  examining  the  impulse 
response  of  the  system.  This  procedure  is  identical  to  that  used  with  non-sampled 
systems.  The  function  being  sampled  is  hpre(x),  the  point  spread  function  of  the  pre¬ 
sampled  image.  Assume  the  following  definitions: 

^  =  spatial  frequency  (cycles  per  milliradian) 

V  =  sample  frequency  (samples  per  milliradian) 
d  =  spatial  offset  of  origin  from  a  sample  point  (in  milliradians) 

Hpre(Q  is  the  pre-sample  MTF  (optics  and  detector) 

Pix(Q  is  the  display  MTF  (crt  spot,  sample  and  hold,  eyeball  MTF) 

Then  the  response  function  Rsp(Q  is  given  by  the  following  equation. 

=  "iT  H 

n=-oo 

(7.1) 

n^O 

The  response  function  has  two  parts,  a  transfer  term  and  spurious  response  term.  The  n=0 
term  in  Equation  7.1  is  the  transfer  response  (or  baseband  response)  of  the  imager.  The 
transfer  response  does  not  depend  on  sample  spacing,  and  it  is  essentially  the  only  term 
that  remains  for  very  small  sample  spacing.  A  very  well  sampled  imager  has  the  same 
transfer  response  as  a  non-sampled  imager. 

However,  a  sampled  imager  always  has  the  additional  response  terms  (the  n^d)  terms). 
These  terms  mathematically  describe  the  spurious  response.  The  spurious  response  terms 
in  Equation  7.1  are  filtered  by  the  display  MTF,  Pix( | ),  in  the  same  way  that  the  transfer 
response  is  filtered.  However,  the  position  of  the  spurious  response  terms  on  the 
frequency  axis  depends  on  the  sample  spacing.  Also,  the  phase  relationship  between  the 
transfer  response  and  the  spurious  response  depends  on  the  sample  phase.  See  Figure  7.1 
for  a  graphical  illustration  of  the  transfer  and  spurious  response  terms. 
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frequency 

Figure  7.1  Notional  plot  of  the  sampled  imager  response  function.  The  pre¬ 
sample  MTF  H(^  is  replicated  at  multiples  of  the  sample  frequency.  The  transfer 
response  is  the  pre-sample  MTF  multiplied  by  the  display  and  eye  MTF  Pix(^. 
The  spurious  response  is  the  pre-sample  replicas  filtered  by 


7.2  Impact  of  Sampling  on  Range  Performance 

A  number  of  experiments  have  been  performed  to  discover  the  impact  of  spurious 
response  on  targeting  performance.  Based  on  these  experiments,  spurious  response  at 
frequencies  less  than  the  half-sample  rate  (that  is,  in-band  aliasing)  has  little  effect  on 
recognition  or  ID  performance.  It  appears  that  some  effect  occurs  at  long  ranges  where 
acquisition  probabilities  are  low;  this  is  logical  because,  at  long  range,  there  are  very  few 
pixels  on  target.  However,  at  ranges  of  practical  interest,  in-band  corruption  tends  to 
affect  minor  details  but  does  not  change  the  basic  presence  or  location  of  important  cues. 

Out-of-band  spurious  response,  however,  tends  to  mask  the  underlying  image.  Line 
raster,  pixel  edges,  and  other  spurious  high-frequency  content  does  degrade  targeting 
performance.  The  amount  of  performance  degradation  depends  on  the  ratio  of  spurious 
content  to  image  content.  The  spurious  response  ratio  (SRRout)  of  integrated  out-of-band 
spurious  response  to  the  integrated  transfer  response  is  a  good  indicator  of  performance 
degradation. 


00 

I  Spurious  response 

SRB-out=^ -  (7.2) 

{Transfer  response 
0 

Many  imagers  have  different  sample  spacings  horizontally  and  vertically;  for  example, 
most  scanning  thermal  imagers  have  better  sampling  in  the  horizontal  direction.  SRRout  is 
calculated  in  the  two  dimensions  independently,  and  the  geometric  mean  is  used  to 
estimate  performance  degradation. 

In  real  imagers,  the  display  and  eye  MTF  limit  the  frequency  content  visible  to  the 
observer.  When  doing  numerical  integrals,  a  practical  limit  for  the  upper  frequency  is  2.5 


57 


times  the  sample  frequeney.  Also,  the  replieas  eentered  on  frequeneies  above  twiee  the 
sample  frequeney  are  effeetively  filtered  out.  Quite  often,  the  replieas  of  the  pre-sample 
MTF  overlap  in  the  frequeney  domain;  in  Figure  7.1,  there  is  a  small  overlap  between  the 
first  and  seeond  replieas.  In  the  overlap  region,  the  signals  from  different  replieas  are 
root-sum-squared  before  integration. 


SRRHout 


2.5k 

1  X  ^  pre  -  nv)H post{^)d^ 

v/2  n=-2,-l,l,2 _ 

2.5k 

0 


2.5// 

J 

SRRV„^,=^ 


X  ^ pre^  post^^^^ 

//=-2,-l,l,2 

2.5// 

0 


(7.3) 


(7.4) 


When  predieting  the  probability  of  aeeomplishing  a  task  at  range,  sampling  artifaets 
reduees  the  resolved  eyeles. 

^sampled  ~  ^resolved  ^  out  '\/l  ^  ^-^^^RR^out 


Nresoived  IS  the  resolved  eyeles  on  target  ealeulated  using  Equation  6.5.  Nsampied  is  used  in 
lieu  of  Nresoived  in  Equation  6.6  to  ealeulate  probability. 

In  these  equations, 

SRRHout  =  out-of-band  spurious  response  ratio  in  horizontal  dimension 
SRRVout  =  out-of-band  spurious  response  ratio  in  vertieal  dimension 
Hpre(^)  =  horizontal  pre-sample  MTE 
Vpre(Tl)  =  vertieal  pre-sample  MTE 
^  =  horizontal  spatial  frequeney  in  (milliradian)'^ 

T]  =  vertieal  spatial  frequeney  in  (milliradian)'^ 

K=  horizontal  sample  frequeney  in  (milliradian)'* 
p  =  vertieal  sample  frequeney  in  (milliradian)'* 

Heye(^  or  Tj)  =  eyeball  MTE 
Heiec(^)  =  horizontal  eleetronies  MTE 
VeiecCfi)  =  vertieal  eleetronies  MTE 
Hdsp(^)  =  horizontal  display  MTE 
VdspCfi)  =  vertieal  display  MTE 
Hsys(^)  =  horizontal  system  MTE 
VsysCh)  ^  vertieal  system  MTE 

Hpost(^)  =  Helec(OHdsp(OHeye(0 
Vpost(fi)  ~  Veiec(ri)Vdsp(T))Heye(r|) 
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7.2.1  Discussion 


A  sampling  model  which  ignores  corruption  of  the  baseband  signal  would  seem  to  be 
counter-intuitive.  There  must  be  a  point  at  which  the  original  signal  is  so  corrupted  by 
aliasing  that  a  performance  impact  results.  Experiments  36  and  44  were  run  to  examine 
this  case.  These  experiments  are  described  in  detail  in  Appendices  A  and  C.  Experiment 
36  used  an  ID  task  and  Experiment  44  used  a  recognition  task.  A  large  amount  of  aliasing 
at  frequencies  less  than  the  half-sample  frequency  was  created  by  using  a  very  small 
detector  fill  factor.  These  experiments  support  the  conclusion  that  range  degradation  is 
predicted  based  on  the  out-of-band  spurious  response. 

As  described  in  the  appendices,  E/2,  diffraction  limited  optics  were  used  with  a  256  by 
256  staring  array  which  had  a  0.0016  fill  factor  (one  micron  square  detector  on  a  25 
micron  square  pitch).  The  sampled  imagery  appeared  corrupted;  the  internal  details  and 
shape  of  the  target  vehicles  were  significantly  distorted.  Intuitively,  viewing  the  images, 
it  appeared  that  scene  structure  was  destroyed,  not  that  raster  or  display  pixel  structure 
was  obscuring  the  underlying  scene  details. 

Nonetheless,  experimental  results  support  the  conclusion  that  performance  degradation 
due  to  sampling  is  predicted  by  the  amount  of  out-of-band  spurious  response.  This  result 
might  be  more  understandable  when  it  is  realized  that  the  small  detectors  were  generating 
large  amounts  of  out-of-band  energy;  the  in-band  signal  was  being  aliased  in  a  way  that 
created  significant  high  frequency  content  that  was  not  filtered  out  by  even  good  display 
pixel  interpolation.  The  small  fill  factor  did  result  in  a  27%  loss  in  range  performance, 
but  the  performance  loss  was  predictable  from  the  out-of-band  Spurious  Response  Ratio. 

All  of  the  sampling  experiments  have  involved  either  identifying  or  recognizing  targets; 
the  applicability  of  the  model  for  the  detection  task  has  not  been  verified.  As  discussed  in 
Appendices  A  and  C,  the  sampling  adjustment  appears  to  be  optimistic  when  the  targets 
are  at  long  ranges  and  have  few  samples  on  target.  That  is,  sampling  appears  to  have  a 
greater  effect  on  ID  and  recognition  when  the  targets  are  at  long  range  and  are  poorly 
resolved.  This  might  infer  that  the  detection  task,  which  involves  few  cycles  on  target,  is 
more  affected  by  sampling  than  either  ID  or  recognition. 

It  should  be  remembered,  however,  that  recognition  is  the  level  of  discrimination  at 
which  the  observer  knows  he  is  looking  at  a  target.  The  low  cycle  criteria  associated  with 
the  detection  task  occurs  because  of  the  acceptance  of  many  false  alarms.  It  is  possible 
that  Equation  7.5  needs  to  be  adjusted  to  accurately  predict  detection,  but  that  is  not 
certain.  Based  on  experience  using  E*  generation  thermal  imagers  in  search  experiments, 
poor  vertical  sampling  lead  to  increased  false  alarms,  not  an  increase  in  the  number  of 
cycles  needed  to  detect  the  target. 
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Modeling  Reflected-Light  Imagers 

Imagers  of  reflected  light  operate  in  the  spectral  band  between  0.4  and  2.0  microns.  This 
spectral  region  constitutes  the  visible  band  from  0.4  to  0.75  microns  and  the  near  infrared 
(NIR)  band  from  0.75  to  3.0  microns.  Quite  often,  light  with  wavelengths  between  one 
and  two  microns  is  called  short  wave  infrared  (SWIR).  Natural  light  is  abundant  in  the 
0.4  to  2.0  micron  spectral  band.  Figure  8.1  shows  illumination  from  sunlight,  moonlight, 
and  starlight  (including  airglow).  The  visible  band  is  especially  bright  in  the  day,  and  the 
SWIR  is  the  brightest  of  the  three  bands  on  a  moonless  night.  The  figure  shows 
illumination  through  the  atmosphere;  the  moon  and  sun  are  both  at  a  60  degree  zenith 
angle.  There  are  four  distinct  atmospheric  absorption  bands  apparent  in  the  illumination 
spectra;  these  are  at  0.95,  1.1,  1.4,  and  1.9  microns.  These  absorption  bands  also  affect 
atmospheric  transmission;  transmission  over  a  one  kilometer,  horizontal  path  is  shown  in 
Figure  8.2.  In  addition  to  abundant  natural  illumination,  the  clear  atmosphere  is  fairly 
transparent  over  most  of  the  0.4  to  2.0  micron  spectral  band. 


Figure  8.1  Illumination  from  the 
sun,  moon,  and  starlight.  Most  of 
the  “starlight”  illumination  is 
actually  from  air  glow. 


Figure  8.2  Atmospheric 
transmission  over  a  one 
kilometer,  horizontal  path. 
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Target  and  background  reflectivities  tend  to  vary  with  wavelength  in  this  spectral  region; 
natural  and  manmade  objects  tend  not  to  be  gray  bodies.  Figure  8.3  shows  the  spectral 
reflectivity  of  a  foreign  paint,  sand,  gravel,  a  mixed  soil,  and  dead  grass.  The  paint 
closely  matches  the  gravel  and  soil  out  to  about  1 .2  microns  and  closely  matches  the  sand 
beyond  1 .2  microns.  The  paint  has  very  different  reflectivity  properties  from  dead  grass 
(the  top  curve  in  the  figure)  over  the  entire  spectral  range.  The  apparent  contrast  seen  by 
the  imager  depends  on  the  background  and  also  on  the  spectral  band  chosen 


Figure  8.3  Spectral 
reflectivities  of  a  foreign, 
tactical-vehicle  paint  and 
various  kinds  of  dirt  and  grass. 


8.1  Staring  Focal  Plane  Arrays 

The  theory  for  a  solid  state  camera  is  developed  in  this  section.  A  diagram  of  a  solid  state 
imager  is  shown  in  Figure  8.4.  A  lens  focuses  light  onto  a  two-dimensional  focal  plane 
array  of  detectors  (the  FPA).  Photo-current  is  generated  in  each  detector  for  a  fraction  of 
each  frame  or  field  interval;  the  stored  charge  is  read  out  and  formatted  for  display. 


Two  dimensional 
array  of  detectors 


VIDEO  DISPLAY 


figure  8.4  uiagram  oi  a  soiia  state  imager. 


The  calculation  of  photo  current  is  described  in  references  such  as  the  Electro-Optics 
Handbook  (Burle  Industries,  1974).  The  detector  current  from  a  scene  element  is 
calculated  as  follows. 


photocurrent 


^det  Hdet 

4Fp 


i^l{X)T{X)R{X)R,p(X)c{A)dX 


where 


X  =  Wavelength  in  pm 

Fo  =  focal  length  of  objective  lens  in  centimeters 


(8.1) 
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F#  =  focal  length  Fq  divided  by  aperture  diameter 

I(k)  =  Illumination  in  watts  cm~^  mioron“l 

T(k)  =  Transmission  of  atmosphere  as  a  function  of  A, 

R(k)  =  Spectral  reflectance  of  the  seene  element  as  a  funetion  of  A, 

Rsp(^)  ~  Detector  response  in  amperes  per  watt  as  a  function  of  A, 

C(k)  =  Objective  lens  and  speetral  filter  transmission  as  a  funetion  of  A, 

Vdet  and  Hdet  are  vertieal  and  horizontal  dimensions  of  deteetor  aetive  area  in 
eentimeters 


Let  Rt  and  Rb  represent  the  deteetor  photoeurrent  speetral  integral  for  targets  and 
baekgrounds,  respeetively.  Beeause  the  signal  is  proportional  to  the  photo-eurrent  and 
noise  is  proportional  to  the  square  root  of  the  photo-eurrent,  the  average  eleetron  flux  per 
solid  angle  is  used  in  the  model.  The  spatial  frequeney  unit  is  cyeles  per  milliradian.  We 
want  to  ealculate  the  average  number  of  eleetrons  per  seeond  in  a  square  milliradian 
(Eav);  this  is  beeause  noise  power  speetral  density  has  units  of  (seeond-milliradian  )'  . 
Power  speetral  density  is  in  the  frequeney  domain;  the  ealeulation  here  is  in  the  spaee 
domain. 

Eav  =  0.5  10-^  (TIt+Rb;  Fo^  /  (Vpd  Hpd  e')  (8.2) 

where 


e  =  eharge  on  an  eleetron  (1.6E-19  Coulombs  per  eleetron) 

Hpit  and  Vpit  are  horizontal  and  vertieal  deteetor  piteh  in  eentimeters 

The  ratio  Eq  /  (VpitHpit)  gives  the  number  of  photo-deteetors  in  a  square  radian;  the  E-6 
factor  converts  this  to  the  number  in  a  square  milliradian.  The  unit  “square  radian”  rather 
than  steradian  seems  strange;  remember,  however,  that  the  model  treats  two  dimensions 
as  a  one-dimensional  ealeulation,  done  twice.  Calculations  are  not  really  done  in  two- 
dimensional  spaee. 

Equations  4.14  and  4.15  for  CTEHsys  and  CTEVgys  can  now  be  written  for  a  solid  state 
imager.  In  the  following  equations,  and  p’  are  dummy  variables  of  integration. 


CTF\ 


CTFH,y,{^)  = 


SMAG 


asp  sys 


(^) 


+ 


2/7 

a  E 


av 


QHhori^)QVh 


A/2 


or 


K^con 


"av 


(8.3) 


ctf{ 

^^^y/SMAG 

^ dsp  ^sys  (*5  ) 

where 

a  =  169.6  root-Hertz  (a  proportionality  faetor) 

^  =  horizontal  spatial  frequency  in  (milliradian)'^ 
p  =  vertical  spatial  frequency  in  (milliradian)'* 


CTFV,y,{?j) 
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y^con 
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(8.4) 
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CTF(^/SMAG)  =  naked  eye  Contrast  Threshold  Function;  see  Appendix  E 

Kcon  =  contrast  enhancement 

B(^  or  Tj)  =  the  Equation  (3.9)  eye  filters 

Heye(^  or  Tj)  =  eyeball  MTF;  see  Appendix  E 

Heiec(^)  =  horizontal  electronics  MTF 

VeiecCp)  =  vertical  electronics  MTF 

Hdsp(^)  =  horizontal  display  MTF 

VdspCE)  ^  vertical  display  MTF 

Flsys(^)  =  horizontal  system  MTF 

Vsys(ri)  =  vertical  system  MTF 

QHhor  =  horizontal  noise  bandwidth  for  CTFHsys  defined  by  Equation  8.5 
QVhor  =  vertical  noise  bandwidth  for  CTFHsys  defined  by  Equation  8.6 
QHver  =  horizontal  noise  bandwidth  for  CTFVsys  defined  by  Equation  8.7 
QVver  =  vertical  noise  bandwidth  for  CTFVsys  defined  by  Equation  8.8 


B{?  I  melee  (?)Hdsp(?  )Heye  f  f 


/SMAG 


Q^hor  -  j  ^elecilWdspil)^ eye\  /SMAG 


QHyer  ~  j 
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(8.5) 

(8.6) 

(8.7) 

(8.8) 


Equations  8.3  and  8.4  assume  ideal  shot  noise;  other  noise  sources  are  ignored.  This 
assumption  is  realistic  for  most  cameras  under  high  illumination  conditions.  However,  as 
the  light  fails,  noise  sources  other  than  shot  noise  begin  to  dominate. 

Figure  8.5  illustrates  the  read-out  of  a  CCD  imager.  Photo-charges  are  clocked  down,  line 
by  line,  until  they  reach  the  horizontal  shift  register.  After  each  line  is  entered  into  the 
register,  it  is  shifted  out  at  high  speed  through  the  video  amplifier.  In  this  manner,  the 
imagery  collected  in  parallel  at  each  detector  becomes  a  serial  stream.  The  benefit  is  a 
single  output  line,  generally  formatted  as  RS-f70  standard  video.  The  penalty  is  that  the 
high  speed  video  amplifier  is  noisy. 


Figure  8.5  Diagram  of  Video  Read-out 
High  bandwidth  video  amplifier  adds 
noise  to  the  signal. 
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The  video  amplifier  noise  is  typieally  speeified  in  terms  of  noise  eleetrons  per  pixel  per 
field  or  frame.  Although  the  noise  aetually  arises  in  the  amplifier  or  read-out  eireuitry, 
manufaeturers  provide  the  equivalent  number  of  noise  eleetrons  in  order  to  make 
calculation  of  dynamic  range  and  total  noise  easier. 

A  second  common  source  of  excess  noise  is  dark  current.  Dark  current  is  often  specified 
as  electrons  per  pixel  per  frame.  Sometimes,  dark  current  is  specified  as  current  density; 
for  example,  the  dark  current  might  be  specified  as  100  microamperes  per  square 
centimeter.  In  that  case,  the  active  detector  area  and  frame  time  are  used  to  calculate  dark 
electrons  per  pixel  per  frame.  The  noise  associated  with  dark  current  is  the  square  root  of 
the  number  of  dark  current  electrons. 


All  noise  sources  are  added  in  quadrature.  The  noise  in  one  second  and  one  square 
milliradian  is; 

Z7  ^  F  I  ^  1  n“6 

Pnoise~Pav'^  rr  77 

pit  ^ pit 


(8.9) 


where 

Eamp  =  the  amplifier  noise  in  electrons  per  pixel  per  frame 
Edc  =  dark  current  electrons  per  pixel  per  frame 
Tccd  =  fields  or  frames  per  second. 


The  equations  for  threshold  vision  through  the  imager  now  become; 
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(8.11) 


Since  amplifier  noise  can  completely  dominate  performance  at  low  illumination  levels, 
techniques  have  been  developed  to  provide  signal  gain  prior  to  the  read-out  electronics. 
Generally,  however,  the  electron  gain  is  non-ideal  in  the  sense  that  the  gain  itself 
generates  excess  noise.  Sometimes  the  amount  of  excess  noise  depends  on  the  gain 
applied.  For  example,  avalanche  silicon  diodes  have  excess  noise  equal  to  the  square  root 
of  the  gain;  a  gain  of  100  comes  at  the  cost  of  increasing  shot  noise  by  a  factor  of  10. 

Let  Nf  represent  the  noise  factor  which  is  always  greater  than  one.  Nf  might  be  a  fixed 
value  or  might  depend  on  gain  through  the  equation 

A’y  =  Gain^ 

where  y  is  an  exponent  which  depends  on  the  technology  used;  for  silicon  avalanche 
diodes,  y  is  0.5.  Then  Enoise  in  Equations  8.10  and  8.1 1  becomes; 
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If  both  gain  and  noise  factor  are  unity,  then  Equation  8.12  reduces  to  8.9. 

8.1.1  Interlace 


Display  interlace  is  used  to  reduce  electronic  bandwidth  while  maintaining  a  high 
resolution  image.  Electronic  interlace,  also  called  standard  interlace  or  simply  interlace,  is 
illustrated  in  Eigure  8.6.  The  EPA  operates  at  60  Hertz.  However,  the  display  operates  at 
a  30  Hertz  frame  rate.  The  first,  third,  fifth,  every  odd  line  from  the  EPA  is  displayed  in 
the  first  field.  The  even  lines  (two,  four,  six,  etcetera)  are  displayed  in  the  second  field. 
Although  interlace  does  not  degrade  resolution,  the  displayed  signal  to  noise  is  affected 
because  half  the  available  signal  from  the  EPA  is  discarded. 


Figure  8.6  Illustration 
of  Electronic  Interlace 
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Pseudo  interlace  is  a  means  for  using  all  of  the  signal  electrons  while  maintaining  the 
reduced  bandwidth  benefits  of  interlace.  In  the  first  display  field,  photo-electrons  from 
pixels  in  rows  one  and  two  are  added  and  presented  on  display  line  1.  Pixels  on  lines 
three  and  four  are  added  and  presented  on  display  line  3.  The  process  continues,  adding 
odd  lines  to  even  lines  and  displaying  on  odd  lines.  In  field  two,  FPA  lines  two  and  three 
are  added  and  presented  on  display  line  2.  Even  FPA  lines  are  added  to  odd  lines  and 
displayed  on  the  even  lines.  This  process  is  illustrated  in  Figure  8.7.  Pseudo  interlace  uses 
all  of  the  available  signal  electrons  and  therefore  maintains  image  sensitivity.  Also,  field 
alignment  is  properly  maintained;  samples  are  in  the  correct  location.  The  penalty  paid  is 
a  decrease  in  the  vertical  MTF  of  the  imager. 


Figure  8.7  Illustration 
of  Pseudo  Interlace 
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In  Equations  8.10  and  8.1 1,  Eav  is  divided  by  two  for  standard  interlace  but  is  not  affected 
by  pseudo  interlace.  Enoise  in  Equation  8.12  is  affected  as  shown  in  Equation  8.13  where 
Isignal  and  lamp  are  defined; 

lamp  =  Isignal  =  1  for  non-interlacc 
lamp  =  2  for  any  interlace 
Isignal  ^  2  for  electronic  interlace  and 
=  1  for  pseudo  interlace. 
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8.1.2  Snapshot  and  Frame  Integration 

Temporal  integration  of  the  eye  varies  with  light  level.  As  illumination  decreases,  the  eye 
integrates  for  a  longer  period.  If  detector  noise  is  temporally  varying  at  a  fairly  rapid  rate 
(50  or  60  imager  fields  per  second  is  adequate),  then  the  eye  temporally  filters  detector 
noise  in  the  same  way  as  eye  noise.  However,  if  a  snapshot  (single  frame)  is  taken,  or  if 
frame  integration  is  used,  then  the  effect  of  eye  integration  time  must  be  explicitly 
considered. 


The  dependence  of  eye  integration  time  on  display  luminance  is; 

teye  =  .0192  +  .0625  (E  /  1.076)-*’ 

where  E  is  display  luminance  and  teye  is  integration  time. 

Eor  snapshot  imagery,  define  tact  as; 

tact  -  frame  time  for  non-interlace  and  pseudo  interlace 

=  half  of  a  frame  time  for  electronic  interlace. 

Then  Enoise  for  snapshot  (Enoi-snap)  is  related  to  Enoise  for  framing  shown  in  Equation  8.9 
by; 
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If  frame  integrate  is  used,  then  the  effect  depends  on  whether  the  imager  is  in  framing  or 
snapshot  mode.  If  in  snapshot  mode,  then  the  benefit  of  integrating  Eint  frames  is; 
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If  the  imager  is  in  framing  mode,  then  the  benefit  of  frame  integration  is  moderated  by 
the  fact  that  the  eye  is  already  integrating  temporally. 
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8.2  Direct  View  Image  Intensifiers 

Image  intensifiers  amplify  moonlight  and  starlight  at  spectral  wavelengths  between  0.5 
and  1.0  micron.  To  the  left  in  Figure  8.8,  a  pilot  is  wearing  the  Aviator’s  Night  Vision 
Imaging  System  (ANVIS)  which  consists  of  two  oculars,  one  for  each  eye.  A  schematic 
of  one  direct  view  goggle  ocular  is  shown  at  right.  The  objective  lens  forms  an  inverted 
image  of  the  scene  on  the  image  intensifler  tube.  The  I  tube  amplifies  the  brightness  of 
the  image  as  described  below.  The  fiber-optic  twist  erects  the  brighter  image.  The 
eyepiece  creates  a  unity  magnification,  virtual  image  of  the  scene,  allowing  the  pilot  to 
fly  at  night  without  lights.  By  modifying  the  eyepiece  to  create  image  magnification,  a 
single  ocular  can  also  be  an  effective  rifle  sight. 


Eyepiece  ^ .  Objective 

lens  Fiber  12  Tube 
optic  twist 


lens 


Figure  8.8  ANVIS  goggle  shown  at  left;  at  right 
is  a  schematic  diagram  of  a  single  ocular. 

Operation  of  the  I  tube  is  illustrated  in  Figure  8.9.  Photons  from  the  scene  generate 
photo-electrons  in  the  cathode.  A  high  voltage  accelerates  the  photo-electrons  to  the 
micro  channel  plate  (MCP).  The  MCP  consists  of  millions  of  tiny  channels;  these 
channels  are  about  five  microns  in  diameter  on  a  pitch  of  six  microns.  The  channel  length 
to  diameter  ratio  is  about  seventy.  Operation  of  the  MCP  is  shown  by  the  blowup  of  a 
single  channel  at  the  bottom  of  the  figure.  Photo-electrons  enter  the  channel  and  are 
accelerated  by  a  high  voltage  across  the  channel  plate.  Secondary  electrons  are  emitted 
when  the  photo-electrons  strike  the  channel  wall.  The  secondary  electrons  are  then 
accelerated,  strike  the  wall,  and  create  more  electrons.  Electron  gain  through  the  channel 
is  controlled  by  varying  the  voltage  across  the  MCP.  Channel  electrons  exit  the  MCP  and 
are  accelerated  by  another  high  voltage  to  the  phosphor  where  an  image  is  formed. 
Brightness  gain  results  from  the  MCP  electron  gain,  the  energy  gained  from  electron 
acceleration  between  the  MCP  and  phosphor,  and  from  the  fact  that  the  cathode  is 
sensitive  to  a  much  broader  range  of  light  wavelengths  than  the  eye. 

Brightness  gain  is  specified  by  the  ratio  foot  Lamberts  from  the  phosphor  to  foot  candles 

on  the  cathode.  Typical  gain  is  30,000  but  gains  to  100,000  are  possible;  however, 

2 

excessive  gain  leads  to  bothersome  scintillation  in  the  image.  Brightness  output  of  the  I 
tube  is  controlled  by  limiting  the  current  available  to  the  MCP;  generally,  goggle 
brightness  is  limited  to  about  3  fL.  Tube  noise  factor  is  ideally  1.4  based  on  the  open  area 
ratio  (not  ah  photo-electrons  get  through  the  MCP);  typical  noise  factor  is  about  two.  See 
Bender  (2000)  for  a  more  thorough  discussion  of  the  theory  and  specification  of  image 
intensifiers. 
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Figure  8.9  Illustration 
of  Tube  Operation 


BLOWUP  OF  CHANNEL 


Equation  8.17  is  used  to  find  the  photoeurrent  for  one  square  eentimeter  of  eathode  area. 

photocurrent  =  — R{X) Rg„{X)C{X) dX  (8.17) 

4F#^ 

Eav  =  3.13E12(RT+RB;Fo'B,o  (8.18) 

In  Equation  8.18,  Rt  and  Rb  are  the  photoeurrent  integrals  in  amperes  per  square 
eentimeter  for  target  and  baekground,  respeetively.  The  3.13E12  faetor  is  the  produet  of 
0.5  to  average  target  and  baekground  flux,  lE-6  to  eonvert  radians  to  milliradians,  and 
divide  by  the  eharge  on  an  eleetron  (1.6E-19  Coulombs  per  eleetron).  The  Bio  faetor 
aeeounts  for  the  improved  signal  to  noise  available  from  systems  with  two  image 
intensifier  tubes. 

Bio  =  1  for  monoeular  or  bioeular  (one  1^  tube) 

=  2  for  binoeular  (two  P-  tubes) 

In  image  intensifiers,  dark  eurrent  is  ealled  equivalent  baekground  input  (EBI).  Although 
generally  not  important  at  room  temperature,  EBI  ean  be  signifieant  in  very  hot 
environments  or  if  the  P  tube  is  enelosed  with  other  hardware.  The  unit  for  EBI  is  foot 
eandles  (lumen  per  square  foot)  of  2856  K  light.  Tube  speeifieation  sheets  generally  list 
the  responsivity  of  the  tube  (Resp)  in  uamps  per  lumen  of  2856K  light.  So  the  dark  eurrent 
(DCebi)  per  square  eentimeter  of  eathode  area  is; 

DCebi  =  EBI  Resp  lE-6  /  929.03  (8.19) 

where  the  929.03  faetor  eonverts  square  feet  to  square  eentimeters.  The  noise  eleetrons 
(Enoise)  in  one  seeond  and  one  square  milliradian  is; 

Enoise  =  Eav  +  6.25E6  DCebi  Fq^  Bio  (8.20) 

In  order  to  establish  the  eye  CTF,  the  output  brightness  of  the  tube  (Bout)  rnust  be 
ealeulated.  First  the  eurrent  to  light  gain  (Geiec  fL  em  /uamp)  is  ealeulated  from  a 
knowledge  of  tube  gain  (Gtube)  in  fL/fe  and  tube  responsivity. 

Geiec  Gtube  929.03  /Resp  (8.21) 

For  an  eyepieee  transmission  of  Teye,  the  output  brightness  is; 

Bout  =  0.5(Rt  +  Rb)  Geiec  teye  +  EBI  Gtube  teye  (8.22) 
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The  equation  for  horizontal  and  vertical  threshold  vision  through  the  imager  is; 
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where  Eav/  Enoise  is  the  contrast  degradation  Mdsp  due  to  EBI.  For  direct  view  12  systems, 
MTF  loss  is  associated  with  the  optics  (Hopt),  the  tube  (Htube),  and  the  eyepiece  (Hep). 
Since  little  of  the  tube  MTF  is  associated  with  the  cathode,  tube  MTF  filters  the  noise. 
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8.3  Optically  Coupled  to  CCD  or  CMOS 

The  eyepiece  in  Figure  8.8  can  be  replaced  by  a  CCD  or  CMOS  imager  and  display;  this 
allows  the  image  intensifier  to  be  mounted  remotely  from  the  observer.  A  fiber-optic 
minifier  or  optical  relay  lens  is  used  because  the  image  intensifier  format  is  generally 
twice  as  large  as  the  CCD  image  array.  See  Figure  8.10  for  a  schematic  diagram  of  an  F 
CCD  camera. 


Figure  8.10  Illustration  of 
Tube  Optically  Coupled 
to  CCD  or  CMOS  FPA 
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The  MTF  of  the  CCD  and  display  is  applied  to  the  I  tube  signal  and  noise.  HccD  and 
VccD  are  the  horizontal  and  vertical  CCD  MTF,  respectively.  The  CCD  noise  is  filtered 
by  the  CCD  MTF,  display  MTF,  the  eyeball,  and  the  perceptual  filter.  The  CCD  noise  is 
added  in  quadrature  with  the  other  noise  terms;  this  means  that  CCD  noise  must  be 
expressed  in  terms  of  cathode  photoelectrons. 
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where 

Eccd  =  CCD  noise  expressed  as  I  cathode  photo-electrons 

Kcst  =  4.53E13  =  constant  factors  (charge  on  electron,  units  convrsion) 

Eamp  =  Amplifier  noise  per  CCD  pixel  per  field  in  electrons, 

Tccd  =  field  rate  of  CCD 

Rccd  =  footcandle  to  generate  one  electron  in  a  CCD  pixel  each  second 

Apix  =  area  of  a  CCD  pixel  =  Hpit  Vpit 

Kjj  and  Kp  are  horizontal  and  vertical  reduction  ratios 

Bout  =  light  output  from  I  tube  in  fL 

The  shorthand  “CCD”  is  used  to  represent  the  array,  but  the  technology  used  for  the  solid 
state  imager  is  not  relevant.  Also,  optical  coupling  can  be  by  coherent,  fiber  optic  reducer 
as  shown  in  the  figure,  or  a  relay  lens  can  be  used. 

All  calculations  for  the  I  tube  remain  the  same  as  in  Equations  8.18  through  8.26  except 
for  the  addition  of  CCD  and  display  MTF.  The  noise  now  has  two  terms  because  CCD 
noise  is  filtered  differently  than  I  tube  noise.  EHhor  and  EVhor  are  the  spatial  filters  for 
calculating  horizontal  CCD  noise.  EHver  and  EVver  are  the  spatial  filters  for  calculating 
vertical  CCD  noise.  In  the  following,  and  p’  are  dummy  variables  of  integration. 
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Vollmerhausen  (1996)  provides  three  validation  examples;  these  show  a  good  mateh 
between  model  predietions  and  experimental  data.  Appendix  E  provides  details  on  how  to 
model  CCD  MTF  and  fiber-optie  taper  MTF,  respeetively. 

It  is  important  to  realistieally  assess  display  performanee  when  modeling  V-  CCD 
eameras.  This  is  partieularly  true  when  modeling  low  illumination  levels,  beeause  the 
eamera  eleetronie  output  might  not  be  suffieient  to  properly  drive  the  display.  As  a  result, 
the  best  “operator  optimized”  image  might  have  poor  display  contrast.  During  the  valida¬ 
tion  experiments  described  in  Vollmerhausen  (1996),  operator-selected  display  contrast 
ranged  between  10  percent  and  40  percent  when  the  various  cameras  were  used  under 
overcast  starlight  illumination. 

Under  low  target-illumination  conditions,  some  V-  CCD  cameras  will  output  only 
millivolts  of  video  signal.  Cathode  ray  tubes  have  a  power  law  relationship  between  the 
input  voltage  and  the  output  luminance.  At  maximum  gain  and  with  no  brightness  control 
offset,  typical  displays  will  provide  very  little  output  luminance  when  the  input  voltage  is 
only  a  few  millivolts.  Typical  gamma  correction  circuits  do  not  correct  inputs  this  low. 

Adding  display  brightness  with  the  brightness  control  will  move  the  image  up  the  power 
law  curve,  providing  a  larger  luminance  change  for  a  given  change  in  input  voltage. 
Adding  brightness  will  also  make  the  whole  display  brighter,  improving  the  human  visual 
response.  As  a  result  of  the  two  properties  together,  the  display  might  have  the  best 
subjective  appearance  with  minimum  display  luminance  greater  than  zero.  The  operator 
will  choose  poor  contrast  over  no  or  very  low  luminance.  Since,  in  this  instance,  CTF  is 
inversely  proportional  to  display  contrast,  the  display  characteristics  can  be  a  dominant 
factor  in  determining  system  performance. 


8.4  CCD  or  CMOS  Array  inside  Tube 

The  array  can  be  inside  the  vacuum  of  the  image  intensifier  tube.  Electrons  are  directly 
gathered  by  the  CCD  rather  than  optically  coupling  the  CCD  to  the  1  tube  phosphor 
output.  This  is  illustrated  in  Figures  8.1 1  and  8.12.  In  8.1 1,  electrons  are  accelerated  from 
the  cathode  to  the  CCD  by  a  high  voltage.  The  photo-electrons  are  given  sufficient 
energy  to  create  100  to  200  secondary  electrons  when  the  CCD  silicon  is  struck.  This 
provides  near-ideal  electron  gain.  In  Figure  8.12,  an  MCP  is  used.  The  MCP  adds 
complexity  but  provides  advantages.  The  MCP  provides  gain  control;  the  cathode  to 
CCD  voltage  in  Figure  8.11  cannot  be  lowered  too  much,  or  the  image  will  blur.  Also, 
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with  the  MCP,  secondary  electrons  at  the  CCD  are  not  necessary;  the  CCD  (or  CMOS 
array)  is  just  a  collector  of  electrons  and  need  not  provide  gain.  The  arrangement  in 
Figure  8.12  is  expected  to  prolong  CCD  array  lifetime. 


Figure  8.11  CCD  array 
inside  1^  tube  vacuum. 
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Figure  8.12  CCD  array  inside  1 
tube  vacuum;  this  arrangement 
also  has  an  MCP, 


CCD  noise  Eccd  is  calculated  differently  from  that  used  with  optically  coupled 
arrangement.  Otherwise,  Equations  8.30  through  8.39  are  used  to  calculate  CTF  for  these 
imagers.  Using  the  same  Rt  and  Rb  as  in  Equation  8.18  for  the  photo-current  per  square 
centimeter  of  cathode  area,  the  Eccd  is; 


^CCD  = 


\E-6  +  0.5{Rp  +  Rp  )^det^det^e/ec  VcCD 

^ pit^ pit  ^elec 


(8.40) 


where  Geiec  is  the  electron  gain.  This  Eccd  is  used  in  Equations  8.34  and  8.39. 


8.5  Predicting  Probability  versus  Range 

8.5.1  Contrast  Transmission  through  the  Atmosphere 

When  predicting  contrast  transmission,  certain  assumptions  are  made  to  simplify 
calculations.  These  assumptions  constrain  the  scenario  for  which  the  model  is 
appropriate. 

a)  The  target  and  background  are  co-located;  the  target  is  viewed  against  local 
terrain.  Range  to  the  target  and  range  to  the  background  are  the  same.  From  a 
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military  standpoint,  this  is  not  an  unreasonable  assumption,  and  it  relieves  the 
necessity  to  consider  some  complex  situations  where  target  to  background 
contrast  can  actually  reverse. 

b)  Contrast  loss  through  the  atmosphere  is  from  scattering.  Contrast  is  not 
affected  by  absorption  in  the  atmosphere.  As  shown  in  Figures  8.2  and  8.3,  the 
atmospheric  absorption  bands  remove  light  from  the  illumination.  Most  of  the 
atmospheric  path  occurs  before  the  light  hits  the  target  and  background. 
Atmospheric  absorption  is  considered  when  predicting  spectral  illumination. 

c)  Average  luminance  seen  by  the  imager  does  not  change  with  range.  Target-to- 
background  signal  disappears  into  the  average  luminance  established  by  the  target 
and  background  reflectivities.  CTFsys  (or  MRC)  depends  on  the  light  entering  the 
sensor;  noise,  for  example,  is  proportional  to  the  square  root  of  average 
luminance.  In  order  to  use  a  single,  pre-calculated  CTFjys  to  represent  imager 
performance,  the  assumption  must  be  made  that  luminance  does  not  change  with 
range. 

d)  Contrast  is  reduced  by  scattering  of  target  signal  out  of  the  line  of  sight  and  by 
sunlight,  moonlight,  or  starlight  scattered  by  the  atmosphere  into  the  imagers  field 
of  view.  See  Figure  8.13  for  an  illustration.  In  most  scenarios,  path  radiance 
caused  by  light  scattered  into  the  sensor’s  path  is  the  most  serious  cause  of  target- 
to-background  contrast  loss.  The  atmospheric  path  can  appear  brighter  at  the 
imager  than  the  zero  range  target  and  background;  this  results  in  substantial  loss 
of  contrast.  This  part  of  the  model  is  not  completely  self-consistent,  since  the 
luminance  viewed  by  the  imager  is  increasing  with  range  under  these 
circumstances.  However,  the  approximation  that  the  luminance  is  constant  does 
not  generally  lead  to  serious  errors.  The  most  important  factor  is  that  contrast  is 
greatly  reduced  by  the  atmosphere. 

e)  Path  radiance  is  quantified  by  the  Sky-to-Ground-Ratio  (SGR).  As  the 
atmospheric  path  lengthens,  the  path  becomes  brighter.  At  some  point,  the  path 
becomes  “optically  thick.”  That  is,  only  light  from  the  path  is  seen,  and  increasing 
the  path  length  does  not  change  the  path  radiance  because  as  much  light  is 
scattered  out  as  in.  The  SGR  is  the  ratio  between  the  maximum  path  radiance  and 
the  zero  range  radiance.  SGR  does  not  vary  with  range  because  the  peak,  long 
range  value  is  used  in  the  ratio.  Table  8.1  gives  values  of  SGR  for  a  range  of 
environments.  Figure  8.14  shows  the  effect  of  SGR  on  contrast  transmission. 
Equation  8.41  is  used  to  calculate  contrast  loss  for  range  Rng,  Beer’s  Law 
coefficient  P,  and  target  zero-range  contrast  Ctgt-o- 


CtgT 


CtGT-^ 


\  +  SGR 


exp 


-PR 


ng  _  I 


(8.41) 
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Figure  8.13  Sunlight  scattered  from  atmosphere 
degrades  target-to-background  contrast. 
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Figure  8.14  Effect  of  SGR  on  Contrast  Transmission  Left  shows  effect 
when  the  Beer’s  Law  transmission  is  0.9  per  Kilometer.  Right  shows 
effect  with  0.4  per  Kilometer  transmission. 
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Table  8.1  Typical 

Terrain 

Clear 

Overcast 

Values  for  SGR 

Desert 

1.4 

7.0 

Forest 

5 

25 

Typical 

3.0 

8.5.2  Effect  of  Contrast  Enhancement 

Looking  at  examples  like  Equations  8.3,  8.4,  8.34,  and  8.39,  each  CTFHsys  and  CTFVsys 
has  two  or  more  terms.  One  of  the  terms  represents  eye  contrast  limitations  and  depends 
on  Kcon;  the  other  term(s)  depend  on  sensor  noise  and  are  independent  of  Kcon-  In  use,  an 
imager  may  or  may  not  have  the  contrast  optimized  to  view  the  target,  so  contrast 
enhancement  is  one  option  that  can  be  changed  when  calculating  probability  versus 
range.  During  a  search,  for  example,  the  sensor  is  set  to  see  the  environment;  the  target 
has  not  been  found.  When  a  likely  location  is  found,  however,  then  the  imager  might  be 
optimized  to  see  if  a  target  is  present.  So  contrast  enhancement  might  not  be  used  in  the 
wide  field  of  view  during  search,  but  would  be  employed  for  the  target  identification  task. 

The  process  is  the  same  for  CTFHsys  and  CTFVsys,  so  horizontal  calculations  are  used  as 
examples.  Using  Equation  8.3  for  CTFHsys: 
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(8.42) 


CTFH,y,  (^)  =  [CTFH^ye  «)  +  (^ )]'  ^  ^ 

where 


CTFl 


CTFH^y^{^)  = 


SMAG 


^con 


CTF 


CTFH,,y,{^)  = 


iV 

[/SMAG^ 


A^E^^QH,,„.(^)QVi,„.'' 


dsp  sys\h)  \ 


^av 


1/2 


(8.43) 


(8.44) 


Similar  equations  can  be  written  for  CTFVsys.  Models  like  SSCAM,  12MRC,  and  12CCD 
output  four  arrays  for  each  illumination  and  target-background  combination  modeled. 
Those  arrays  are  CTFHeye,  CTFHsen,  CTFVeye,  and  CTFVsen-  The  value  for  Kcon 
determines  how  the  four  arrays  are  combined  to  predict  the  system  CTF. 

The  options  provided  in  the  reflective  model  are;  no  contrast  enhancement  (Cdsp  =  Ctgt), 
display  contrast  of  0.25  (Cdsp  =  0.25),  and  display  contrast  of  0.5  (Cdsp  =  0.5).  The  value 
of  0.5  was  determined  by  optimizing  a  set  of  144  tactical  vehicle  image  (twelve  aspects 
of  twelve  different  vehicles).  Each  image  was  individually  optimized  to  bring  out  the 
particular  cues  needed  to  ID  that  vehicle  at  that  aspect.  Linearity  was  not  enforced;  pieces 
of  the  picture  were  subdued  or  enhanced  as  necessary  to  provide  an  optimum  for 
identification.  Once  the  optimizing  process  was  complete,  the  contrast  of  the  set  was 
measured  at  0.5.  We  feel  that  it  is  impossible  for  an  automated  process  to  duplicate  this 
degree  of  optimization,  and  that  0.5  therefore  represents  an  extreme  for  modeling 
purposes. 

The  0.25  option  resulted  from  applying  histogram  equalization,  local  area  processing,  and 
allowing  some  non-linear  suppression  of  bright  areas.  The  process  was  “by  hand”  in  the 
sense  that  we  ensured  that  no  cues  were  lost  due  to  the  histogram  equalization  placement 
of  gray  levels.  The  measured  contrast  of  the  resulting  target  set  was  0.25.  This  represents 
the  contrast  that  can  probably  be  achieved  automatically. 

When  doing  range  calculations. 


^con  ~ 


Ctgt 


(8.45) 


As  range  to  the  target  increases  and  target  contrast  (Ctgt)  decreases,  contrast 
enhancement  maintains  the  displayed  contrast  at  a  high  level.  Of  course,  while  the  eye 
term  in  CTFgys  can  be  moderated  by  contrast  enhancement,  the  noise  term  cannot.  Noise 
must  be  low  for  contrast  enhancement  to  help  range  performance. 
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8.5.3  Calculating  Probability  of  Task  Performance 

At  each  range,  apparent  contrast  Ctgt  is  established  based  on  zero  range  contrast  Ctgt-o 
by  using  Beer’s  Law  or  MODTRAN.  Contrast  enhancement  model  is  selected,  and  then 
Kcon  is  calculated  using  Equation  8.45.  CTFHsys  and  CTFVsys  can  then  be  calculated.  The 
FTP  metric  is  calculated  for  both  the  horizontal  and  vertical  dimensions. 


^cut 

TTPh=  I 

^low 

^cut 

TTPy^  I 

mow  - 


Ctgt 


CTFH,yM 


1/2 


Ctgt 
CTFV,M 


nl/2 


drj 


(8.46) 


(8.47) 


‘Cycles  on  target”  Nresoived  is  found  using  Equation  8.48. 
^Atgt  TTPhTTPv 


^resolved  ~ 


Range 


(8.48) 


The  out-of-band  Spurious  Response  Ratio  (SRRout)  is  found  for  both  horizontal  and 
vertical,  and  Nresoived  is  corrected  for  the  presence  of  sampling  artifacts;  see  Part  7. 


^sampled  =  resolved  4l-0.iiSRRH ^  -  0.5BSRRV„„, 
The  TTPF  is  used  to  find  the  probability  of  task  performance. 
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F  =  1.51  +0.24 
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(8.51) 


8.6  Minimum  Resolvable  Contrast 

In  the  laboratory,  sensors  are  characterized  using  Air  Force  3 -bar  charts;  a  chart  is  shown 
in  Figure  8.15.  Each  bar  pattern  is  five  times  longer  than  the  width  of  a  single  bar. 
Generally,  charts  with  1.0  contrast  are  used  in  the  laboratory,  but  charts  are  available  with 
lower  contrast  (but  generally  the  contrast  is  above  about  0.2).  A  plot  of  threshold  contrast 
versus  spatial  frequency  is  called  Minimum  Resolvable  Contrast  (MRC).  A  plot  of 
limiting  frequency  versus  illumination  level  for  a  particular  contrast  is  called  a  limiting 
light  measurement. 

When  predicting  the  results  of  an  MRC  or  limiting  light  experiment,  the  amplitude 
difference  between  the  center  bar  and  the  adjoining  spaces  is  used  in  place  of  the  system 
MTF.  The  amplitudes  are  calculated  as  follows. 
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Figure  8.15  Air  Force  3-bar 
chart  used  to  characterize 
reflected-light  imagers. 
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where 


=  dummy  variable  for  integration 
W  =  1/(2^) 

L  =  5W 

the  bar-length  MTF,  is  sin  (n/L)  /  (nJL) 
Hw{^),  the  bar-width  MTF,  is  sin  (nfW)  /  (nfW) 
Sl  =  Fraetional  intensity  due  to  blur  of  bar  length 
The  relationship  between  CTFsys  and  MRC  is; 
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Modeling  Thermal  Imagers 

Thermal  imagers  sense  heat  energy  with  wavelengths  between  three  and  twelve  microns. 
The  three  to  five  micron  band  is  called  mid-wave  IR  (MWIR)  and  the  eight  to  twelve 
micron  band  is  called  long-wave  IR  (LWIR).  Figure  9.1  shows  typical  atmospheric 
transmission  for  a  one  Kilometer,  horizontal  path;  there  are  three  clear  windows  from  3  to 
4.2,  4.4  to  5,  and  8  to  13  microns. 


ATMOSPHERIC  TRANSMISSION 


Wavelength  (microns) 


Figure  9.1  Atmospheric  Transmission  over  1  Kilometer  Path 

Figure  9.2  shows  a  schematic  diagram  of  a  thermal  imager  which  uses  a  staring  focal 
plane  array  (FPA)  of  detectors.  The  thermal  scene  is  imaged  by  the  objective  lens  onto 
the  FPA.  The  individual  detector  signals  are  time  multiplexed  and  converted  to  a  video 
signal  for  display. 


Figure  9.2  Illustration  of  Thermal  Staring  Sensor 


Figure  9.3  shows  a  parallel  scan  thermal  imager.  The  afocal  provides  a  magnified  image 
at  the  scanner.  The  scene  is  scanned  over  a  linear  array  of  detectors  by  an  oscillating  or 
rotating  mirror.  The  time  that  each  detector  dwells  on  a  point  in  the  scene  is  less  than  that 
of  the  staring  sensor;  as  a  result,  sensitivity  is  reduced. 
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Figure  9.3  Illustration  of  Scanning  Thermal  Sensor 


Everything  near  room  temperature  radiates  at  these  wavelengths.  The  emissivity  of 
natural  objeets  is  generally  above  70  pereent;  most  manmade  objeets  are  also  highly 
emissive.  Thermal  sensors  derive  their  images  from  small  variations  in  temperature  and 
emissivity  within  the  seene.  Typieally,  the  thermal  seene  is  very  low  eontrast. 

Figure  9.4  shows  the  speetral  radianee  from  blaekbodies  at  300  K  and  303  K.  The 
differenee  between  the  two  eurves  is  also  shown.  As  ean  be  seen  from  the  figure,  a  3  K 
ehange  in  blaekbody  temperature  results  in  only  a  small  relative  ehange  in  the  radiated 
energy.  However,  a  3  K  average  for  the  apparent  temperature  differenee  within  a  seene 
represents  very  good  thermal  imaging  eonditions.  A  thermal  imager  will  provide  a  good 
image  under  these  eonditions.  The  thermal  seene  is  low  eontrast  even  under  good  thermal 
imaging  eonditions. 


Figure  9.4  Thermal  radiation 
from  300  K  and  303  K 
blaekbodies;  both  are  near 
room  temperature.  Although 
the  difference  represents  good 
thermal  contrast,  the  relative 
difference  is  small. 


Although  the  typieal  thermal  seene  is  very  low  eontrast,  exeeptions  do  exist.  For 
example,  the  radianee  differenee  between  sky  and  ground  ean  be  quite  large  on  a  elear 
day.  Also,  the  elassie  “burning  tank”  ean  overload  a  thermal  imager.  In  general,  however, 
thermal  sensors  are  designed  to  map  small  differenees  in  the  seene ’s  radiant  energy  into  a 
usable  displayed  image. 


In  the  above  example,  seene  thermal  eontrast  was  generated  by  the  temperature 
differenee  between  two  blaekbodies.  In  the  more  general  ease,  the  speetral  radianee  from 
a  thermal  seene  will  depend  upon  a  number  of  faetors.  The  speetral  radianee  of  an  objeet 
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will  depend  upon  its  surface  temperature  and  emissivity  and  upon  the  nature  of  the  light 
being  reflected  or  scattered  from  its  surface.  The  apparent  spectral  radiance  of  an  object 
as  seen  by  an  imager  is  also  affected  by  the  spectral  transmission  of  the  atmosphere. 
These  factors,  coupled  with  the  spectral  sensitivity  of  the  imager  itself,  will  determine  the 
effective  thermal  contrast  within  the  scene  as  sensed  by  a  particular  imager. 

Apparent  temperature  (also  called  equivalent  blackbody  temperature)  is  often  used  as  a 
radiometric  unit.  A  radiometer  is  calibrated  in  terms  of  its  response  to  a  change  in 
*blackbody  temperature.  The  radiometer  is  then  used  to  measure  the  thermal  contrast  of  a 
scene,  and  its  output  is  expressed  as  “temperature.”  The  radiometer  does  not  measure  the 
temperature  state  of  the  scene;  that  is,  the  kinetic  energy  of  the  molecules  in  the  scene 
objects  is  not  measured.  The  radiometer  is  detecting  the  in-band  energy  from  the  scene, 
as  weighted  by  the  spectral  response  of  the  instrument  itself  The  effective  blackbody 
temperature  measured  in  one  spectral  band  cannot  be  assumed  for  a  different  spectral 
band.  When  comparing  MWIR  to  LWIR  sensors,  some  knowledge  is  required  of  the 
relative  signatures  in  the  two  spectral  bands. 


9.1  Signal  and  Noise  in  Thermal  Imagers 

The  units  used  to  describe  signal  and  noise  for  thermal  imagers  are  very  different  than  the 
units  used  when  modeling  reflected-light  sensors.  However,  aside  from  the  details  of 
calculating  signal  and  noise,  the  basic  CTFsys  theory  is  exactly  the  same  as  the  theory 
described  in  Part  8. 

The  dominant  noise  in  thermal  photon  detectors  is  generation  recombination  (GR)  noise. 
In  the  theoretical  limit,  GR  noise  is  the  equivalent  of  the  shot  noise  found  in  devices. 
However,  noise  can  be  increased  by  charge-carrier-phonon  interactions.  Thermal 
detectors  are  generally  Background  Limited  in  Performance  (BLIP);  noise  decreases  in 
proportion  to  the  square  root  of  detector  photon  flux.  However,  part  of  the  background 
flux  arises  from  within  the  imager  itself,  not  just  the  scene.  Even  with  perfect  cold 
shielding,  emission  from  the  optics  can  be  significant.  Also,  the  read-out  electronics  adds 
noise,  particularly  with  high  F-number  cold  shields.  Predicting  the  effect  of  reduced 
scene  temperatures  on  noise  is  difficult.  The  noise  from  a  thermal  detector  is  very  much 
dependent  on  system  design  and  mounting  factors  as  well  as  scene  thermal  flux. 
Generally,  a  thermal  detector’s  noise  characteristics  are  specified  for  a  300  K  background 
temperature  and  a  unique  cold  shield  configuration. 

Spectral  detectivity  (D^,)  is  used  to  specify  the  noise  in  a  thermal  detector. 

Z),=l/]V£i>,  (9.1) 

NEPx  is  the  spectral  noise-equivalent  power;  it  is  the  monochromatic  signal  power 
necessary  to  produce  an  RMS  signal  to  noise  of  unity.  Spectral  D-star  (Dx*)  is  a 
normalization  of  Dx  to  unit  area  and  bandwidth. 

(9-2) 

where 
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4/"=  temporal  bandwidth  and 

Adet  =  Active  area  of  a  single  detector  on  the  FPA  =  Hdet  Vdet 

The  thermal  model  uses  peak  spectral  D-star  and  relative  detector  response  at  other 
wavelengths  to  characterize  detector  performance. 

D*Xpeak  =  DX*  at  wavelength  of  peak  response  and 

S{X)  =  Response  of  detector  at  wavelength  A,  relative  to  peak  response. 

The  spectral  radiant  power  on  the  focal  plane  array  is  calculated  as  follows. 

Efpa='^  L scene  h  (9-3) 

where 

Efpa  =  watt  cm“2  on  the  detector  array, 

Lscene  =  watts  cm“2str“lu“l  from  the  thermal  scene,  and 
T  =  transmission  of  optics. 

The  parameters  t,  Lscene,  and  Efpa  are  all  functions  of  wavelength  X.  The  spectral 
radiant  power  on  a  single  detector  of  the  array  is: 

Edet=E^det  ^  ^  P#"  (9.4) 


The  signal  to  noise  in  one  pixel  (SNpix)  in  one  second  can  now  be  calculated. 


SNpi^ 

SNpi, 


D 


Apeak 


(2^det) 


,1/2 


\Ei,^(X)S(X)dX 


A/I 


D 


Apeak 


(^det ) 


4V2F#^ 


r  \ 

\Lscene{^)^Wd^ 

VAT 
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where  EX  is  the  spectral  band  of  the  sensor,  and  2  Hertz  is  the  bandwidth  Af 

To  estimate  the  differential  spectral  radiance  resulting  from  a  delta  temperature  near  300 
K,  the  following  equation  is  used.  As  long  as  the  bars  are  at  a  temperature  close  to  300  K, 
the  spectral  nature  of  the  difference  signal  is  closely  approximated  by  the  partial 
derivative  of  the  blackbody  equation  with  respect  to  temperature  evaluated  at  300  K. 

=r  dL{%,T)ldT  (9.7) 


where  the  partial  derivative  is  evaluated  at  300  K  and 

L(X,T)  =  Plank’s  Equation  for  blackbody  radiation, 

T  =  Temperature,  and 

r  =  Amplitude  of  apparent  blackbody  temperature  difference. 

Using  the  Equation  9.7  expression  for  the  spectral  radiance  based  on  temperature 
difference,  the  signal  to  noise  on  one  detector  in  one  second  is  now: 
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(9.8) 


SNpi^ 


r  SD^Xpeak 

4^2  F#^ 


where 

S  =  (F  L{A,  T)/^T)S(/l)  d/I  (9.9) 

In  one  square  radian,  the  signal  to  noise  would  inerease  by  an  amount  (Fo^/HpitVpit)!/^ 
where  Fq  is  the  effective  focal  length  of  the  afocal  or  objective  lens.  The  signal  to 
detector  noise  in  one  second  and  one  square  milliradian  is: 

‘^^det  =  (1^  -  6)  r  ^  IstareD  Apeak  W 4  V2F#^  (9.10) 


where  ristare  is  the  square  root  of  the  fdl  factor  ratio  Hdet  Vdet/(Hpit  Vpit).  Equation  9.10 
gives  the  signal  to  noise  for  temperature  difference  E.  Noise  modulation  at  the  display  is 
needed  to  find  CTEsys.  Setting  signal  to  noise  to  unity,  Edet  is  noise  variance  in  units  of 
(K-milliradian)^. 

^det  =  ^  /[{IE  -6)  5  F^  IstareD  ''Apeak  (9-10) 


9.2  CTFsys  for  Thermal  Imagers 

Calculating  CTEsys  requires  that  detector  noise  be  expressed  as  display  luminance  noise. 
This  in  turn  requires  a  mapping  between  radiometric  temperature  changes  in  the  scene 
and  the  matching  luminance  changes  on  the  display.  The  gain  through  the  imager  must  be 
established  in  terms  of  foot-Eamberts  per  Kelvin.  As  with  reflected-light  imagers,  the 
average  and  minimum  display  luminance  is  a  model  input. 

Scene  contrast  temperature  (SCNtmp)  is  the  delta  radiometric  temperature  in  the  scene 
needed  to  generate  the  average  display  luminance  when  minimum  luminance  is  zero. 
Recall  that  the  thermal  image  arises  from  small  variations  in  temperature  and  emissivity 
within  the  scene,  and  these  small  variations  are  superimposed  on  a  large  background  flux. 
Zero  luminance  on  the  display  corresponds  to  the  minimum  scene  radiant  energy,  not  to 
zero  radiant  energy.  SCNtmp  is  not  the  absolute  background  radiometric  temperature;  it  is 
the  temperature  contrast  needed  to  raise  the  display  luminance  from  zero  to  average. 
SCNtmp  is  used  rather  than  Kcon  to  indicate  sensor  gain  state.  A  large  SCNtmp  means  gain 
is  low,  a  small  SCNtmp  means  the  gain  is  high. 


With  display  noise  modulation  established,  CTEsys  can  be  calculated. 


CTF 


CTFH,y,{S) 


SMAG 


^dsp^sysi^) 


1  + 


«  ^det  Q^hor(^)Q^h 


1/2 


or 


SCN 


TMP 


(9.11) 


82 
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where 


a  =  169.6  root-Hertz  (a  proportionality  factor) 

^  =  horizontal  spatial  frequency  in  (milliradian)'^ 
p  =  vertical  spatial  frequency  in  (milliradian)'^ 

CTF(^/SMAG)  =  naked  eye  Contrast  Threshold  Function;  see  Appendix  E 

SCNtmp  =  scene  temperature  which  generates  average  display  luminance 

B(^  or  p)  =  the  Equation  (3.9)  eye  fdters 

Heye(^  or  p)  =  eyeball  MTE;  see  Appendix  E 

HeiecCO  =  horizontal  electronics  MTE 

VeiecCh)  “  vertical  electronics  MTE 

Hdsp(^)  =  horizontal  display  MTE 

VdspCh)  ^  vertical  display  MTE 

Hsys(^)  =  horizontal  system  MTE 

Vsys(p)  =  vertical  system  MTE 

QHhor  =  horizontal  noise  bandwidth  for  CTEHsys  defined  by  Equation  8.5 
QVhor  =  vertical  noise  bandwidth  for  CTEHsys  defined  by  Equation  8.6 
QHver  =  horizontal  noise  bandwidth  for  CTEVsys  defined  by  Equation  8.7 
QVver  =  vertical  noise  bandwidth  for  CTFVsys  defined  by  Equation  8.8 
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(9.13) 

(9.14) 

(9.15) 

(9.16) 


p  stare  IS  Currently  detector  fill  factor.  However,  due  to  limitations  in  photo-electron 
storage  capacity,  the  EPA  might  not  integrate  signal  for  a  full  field  time.  The  efficiency 
factor  used  in  Equation  9.10  should  be  adjusted  for  detector  integration  time. 

^ stare  ~  hni  ^CCD  ^det  ^  pit  ^pit )  (9-17) 


where 

tint  =  detector  integration  time  <  =  1/Tccd 
Tccd  =  field  rate  (probably  60  Hertz) 
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Hdet  =  horizontal  active  area  of  deteetor 
Vdet  =  Vertieal  aetive  area  of  deteetor 
Hpit  =  horizontal  deteetor  piteh 
Vpit  =  vertieal  deteetor  piteh 

Aside  from  slightly  different  MTF  eonsiderations  discussed  below,  the  only  ehange 
needed  to  model  seanning  imagers  is  to  adjust  the  noise  for  the  redueed  dwell  time.  For 
seanning  sensors,  a  different  effieieney  faetor  (rieff)  is  used.  The  dwell  time  is  redueed  by 
the  amount  of  deteetor  area  divided  by  the  image  area  at  the  deteetor  foeal  plane.  Also, 
the  seene  is  generally  over-seanned  by  the  seanner,  either  to  look  at  thermal  referenees  or 
for  turn-around,  so  r[scan  is  less  than  unity.  For  seanning  thermal  imagers,  rjeff  is  used  in 
Equation  9.10  rather  than  r] stare. 

^eff  ~  ^scan  ^det  -^det  ^det  ^  VjjFO  (9.18) 

where 

r\scan  =  sean  effieieney  (generally  0.4  to  0.75), 

Ndet  -  total  number  of  deteetors,  either  in  parallel  or  in  Time  Delay  and  Integrate, 
Example:  for  a  240  by  4  in  TDI  EPA,  Ndet  =  960 
EOVh  =  horizontal  field  of  view  of  the  imager  in  radians,  and 
EOVv  =  vertieal  field  of  view  of  the  imager  in  radians. 

MTE  of  the  opties,  deteetor,  and  display  are  the  primary  eontributors  to  system  MTE  for 
thermal  imagers.  Other  likely  sourees  of  blur  are  jitter  in  the  line-of-sight  due  to  laek  of 
stabilization,  vibration  of  the  display  relative  to  the  eye,  and  digital  proeessing.  Eor 
staring  sensors,  no  MTE  is  assoeiated  with  deteetor  integration  time.  This  is  not  true  for 
seanning  imagers,  however.  During  photo-eleetron  integration,  the  seene  is  seanned  over 
the  deteetor,  so  image  blur  results  from  temporal  integration  of  the  deteetor  signal.  This  is 
an  important  souree  of  blur  in  seanning  imagers. 


9.3  Predicting  Probability  versus  Range 

9.3.1  Contrast  Transmission  through  the  Atmosphere 

Certain  assumptions  are  made  to  simplify  ealeulations.  These  assumptions  eonstrain  the 
seenario  for  whieh  the  model  is  appropriate. 

a)  The  target  and  baekground  are  eo-loeated;  the  target  is  viewed  against  loeal 
terrain.  Range  to  the  target  and  range  to  the  baekground  are  the  same. 

b)  Absorption  as  well  as  seattering  in  the  atmosphere  ean  be  important.  An 
interfaee  to  MODTRAN  is  provided.  Sinee  the  speetral  nature  of  the  target-to- 
baekground  signature  is  defined  by  Equation  9.9,  this  speetral  weighting  is 
ineluded  in  the  MODTRAN  implementation. 

e)  Average  radianee  seen  by  the  imager  does  not  ehange  with  range.  The 
baekground  flux  is  from  a  300K  blaekbody.  Target-to-baekground  signal 
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disappears  into  the  average  radianee.  The  300  K  radianee  establishes  sensor  noise 
eharaeteristies. 

Apparent  target  eontrast  results  from  predieting  the  apparent  radiometrie  temperature 
differenee  and  then  dividing  by  twiee  the  seene  eontrast  temperature.  SCNtmp  is 
determined  by  hardware  setup  and  the  environment  viewed.  It  is  possible  for  those 
intimate  with  speeifie  hardware  designs  to  aeeurately  determine  SCNtmp-  However,  some 
simplifying  assumptions  are  suffieient  for  most  modeling  purposes. 

a)  When  the  imager  is  optimized  on  a  speeifie  target,  SCNtmp  is  between  three 
and  five  times  the  target  apparent  RSS  eontrast.  If  an  observer  is  attempting  to 
identify  a  target,  for  example,  it  is  reasonable  to  assume  that  the  imager  is 
optimized  for  the  purpose.  Sinee  larger  SCNtmp  results  in  lower  eontrast,  an 
optimistie  assumption  is  that  SCNtmp  is  three  times  larger  than  the  target  eontrast. 

b)  When  searehing  for  a  target,  the  imager  gain  is  adjusted  based  on  seene 
eontent  and  not  ehanged.  Thermal  eontrast  between  0.1  and  0.3  K  represents  poor 
thermal  seene  eontrast.  Moderately  good  eontrast  is  between  1  and  3  K;  first 
generation  thermal  imagers  (eirea  the  mid  1980’s)  work  well  with  thermal 
eontrast  in  the  1  to  3  K  range.  Likely  values  for  SCNtmp  are  1  K  for  poor  thermal 
seenes  and  5  to  10  K  for  good  thermal  eontrast.  However,  when  modeling  seareh, 
do  not  input  SCNtmp  less  than  three  times  the  target  intrinsie  (zero  range)  thermal 
eontrast.  That  is,  if  the  target  eontrast  is  assumed  to  be  1.25  K,  then  SCNtmp 
would  be  at  least  3.75  K  even  if  poor  weather  is  assumed. 

e)  There  are  eases  like  thermal  line-seanners  where  the  total  field  of  view  is 
extremely  wide.  In  these  eases,  SCNtmp  is  likely  to  be  20  K  or  larger  when 
thermal  eonditions  are  good. 

d)  Sinee  SCNtmp  represents  display  average  luminanee,  it  is  not  physieally 
possible  for  SCNtmp  to  be  less  than  half  of  the  target  to  baekground  thermal 
eontrast.  Given  the  realities  of  thermal  signatures,  SCNtmp  will  realistieally  be 
three  to  five  times  the  target  RSS  thermal  signature  as  suggested  above. 


9.3.2  Effect  of  Contrast  Enhancement 

Contrast  enhaneement  ean  signifieantly  boost  performanee.  As  an  example,  an  observer 
is  searehing  using  a  LWIR  imager  on  a  fairly  humid  day  with  a  0.7  per  kilometer 
transmission.  If  a  2  K  target  is  at  four  kilometers,  then  apparent  eontrast  is  (.7)"^  2/20  or 
about  0.024  eontrast  on  the  display.  In  this  example,  seene  eontrast  temperature  is  taken 
as  five  times  the  target  temperature  or  10  K.  It  is  assumed  that  the  baekground  seene 
eontent  is  fairly  hot,  and  this  really  establishes  the  seene  eontrast  temperature.  If  the 
observer  deteets  the  target  and  then  optimizes  the  imager  for  target  ID,  SCNtmp  is 
adjusted  (gain  is  inereased)  to  a  value  five  times  the  apparent  temperature  (SCNtmp 
beeomes  2.4  K),  so  that  the  eontrast  on  the  display  is  0.1.  Of  eourse,  the  benefit  of  the 
improved  eontrast  depends  on  the  noise  eharaeteristies  of  the  sensor,  but  the  improved 
eontrast  eould  be  signifieant. 

The  following  “rule  of  thumb”  is  suggested. 
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a)  When  search  is  modeled,  SCNtmp  is  set  based  on  scene  contrast  or  at  least 
three  times  the  target  signature,  whichever  is  bigger.  Atmospheric  transmission 
reduces  the  apparent  thermal  signature  (Ttgt)  as  range  increases,  and  Ctgt  is 
modeled  as  Cjgt  ~  Ttgt/(2  SCNxmp)- 

b)  When  modeling  target  ID  or  any  circumstance  where  contrast  enhancement 
can  be  assumed,  then  Ctgt  is  fixed  at  the  zero  range  value.  If  SCNtmp-o  is  the 
input  value  (the  zero  range  value)  of  scene  contrast  temperature,  then  SCNjmp  = 
Ttgt  SCNtmp-o  /  Ttgt-o- 


9.3.3  Calculating  Probability  of  Task  Performance 

At  each  range,  apparent  thermal  contrast  Ttgt  is  established  based  on  zero  range  contrast 
Ttgt-o  by  using  Beer’s  Law  or  MODTRAN.  If  no  contrast  enhancement  is  assumed,  then 
Ctgt  =  Ttgt/(2 SCNtmp)  and  SCNtmp  remains  constant  at  SCNtmp-o-  If  contrast 
enhancement  is  used,  then  Ctgt  =  Ttgt-o/(2 SCNtmp-o)  but  the  SCNtmp  used  to  calculate 
CTFHsys  and  CTFVsys  decreases  with  range:  SCNtmp  =  Ttgt  SCNtmp-o  /  Ttgt-o-  CTFHsys 
and  CTFVsys  are  calculated  and  the  TTP  metric  is  found  for  both  the  horizontal  and 
vertical  dimensions. 
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Cycles  on  targef’  Nresoived  is  found  using  Equation  9.21. 
^AtgT  TTP^TTPy 


(9.19) 
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The  out-of-band  Spurious  Response  Ratio  (SRRout)  is  found  for  both  horizontal  and 
vertical,  and  Nresoived  is  corrected  for  the  presence  of  sampling  artifacts;  see  Part  7. 
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The  TTPF  is  used  to  find  the  probability  of  task  performance. 
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where 

^  =  1.51  +0.24 
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9.4  Minimum  Resolvable  Temperature 

In  the  laboratory,  thermal  sensors  are  eharaeterized  using  4-bar  patterns.  The  bar- target 
arrangement  is  shown  in  Figure  9.4.  The  bars  are  cut  into  a  blackened,  metal  plate  which 
is  mounted  in  front  of  a  blackbody.  The  temperature  difference  between  the  plate  and 
blackbody  is  controlled.  Each  bar  pattern  is  seven  times  longer  than  the  width  of  a  single 
bar.  A  plot  of  temperature  difference  between  the  bars  and  spaces  versus  spatial 
frequency  is  called  Minimum  Resolvable  Temperature  (MRT). 

blackbody 

Figure  9.4  4-bar  pattern  used  for 
MRT.  Blackbody  is  viewed  through 
the  openings  of  metal  plate. 


Plate  with  bar-pattern 
cut-outs 

MRT  is  a  poorly  controlled  measurement.  The  imager  gain  and  level  are  “optimized”  for 
each  bar  size;  saturation  is  permitted.  Display  luminance  and  contrast  are  not  controlled 
or  measured.  The  imager’s  settings  are  not  monitored,  and  the  bar  targets  are  not  viewed 
in  a  fashion  that  correlates  to  field  performance.  Experience  over  many  years  suggests 
that  only  a  gross  estimate  of  laboratory  MRT  can  be  predicted.  MRT  should  only  be  used 
as  an  indicator  that  the  system  is  operating  at  some  acceptable  level. 

The  bar  pattern  positions  for  which  the  modulation  difference  is  calculated  are  shown  by 
the  dotted  lines  in  Eigure  9.5  This  modulation  difference  is  used  in  place  of  the  system 
MTE.  Assume  that  the  white  area  is  hotter  and  is  called  the  bar.  The  amplitudes  are 
calculated  as  follows. 


Figure  9.4  4-bar  pattern  showing 

positions  where  difference  ^ 

modulation  is  calculated.  \ 


H4-l,ar  =«'  l-^HsyA4')Hw{e)[2cos(2rrW4')  +  2cos{6rrW^'m'  (9.25) 

Aar  ~  ^  1-00  ^sys  [^')H^_^ar[^')cos{27rW^')d^'  (9.26) 

^ space  ~  ^  f- 00  ^ sys  (!')  ^^4-6ar  [^')cOii(A7cW^')d^'  (9.27) 

Si  (S-ZS) 

where 
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=  dummy  variable  for  integration 
W  =  1/(2^) 

L  =  7W 

the  bar-length  MTF,  is  sin  (n/L)  /  (nfL) 
Hw(^),  the  bar-width  MTF,  is  sin  (nffV)  /  (nfFF) 
Sl  =  Fraetional  intensity  due  to  blur  of  bar  length 
The  relationship  between  CTFsys  and  MRT  is: 


MRT{^) 


^ sys  {S)CTF^,(p 

[AbaA^)-yp^ce(S))Si{S) 


(9.29) 
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Appendix  A:  Description  of  Validation  Experiments 

A  robust  performance  metric  must  provide  accurate  predictions  for  various  shape  and  size  blurs,  good  and 
poor  intrinsic  target  contrast,  various  levels  and  types  of  noise,  and  must  accurately  predict  the  performance 
of  sampled  imagers.  A  series  of  experiments  was  designed  to  ensure  the  metric  is  accurate  for  a  wide 
variety  of  image  characteristics.  Some  of  the  experiments  are  not  reported  here.  Experiments  which 
demonstrated  that  the  model  works  with  both  fixed  spatial  noise  and  temporally  random  noise  are  reported 
in  Reference  23.  The  model  is  applicable  to  both  framing  and  snapshot  imagers.  Experiments  with  white 
noise  which  had  both  uniform  and  Gaussian  amplitude  distributions  are  also  reported  in  that  reference.  We 
found  that  only  the  RMS  noise  level  mattered;  the  nature  of  the  amplitude  distribution  did  not  matter.  This 
paper  describes  experiments  to  test  the  TTP  metric  with  various  types  and  levels  of  blur,  noise,  and 
contrast.  Experiments  with  sampled  imagery  are  also  reported  in  this  paper. 

Two  displays  were  used  in  these  experiments.  The  color  display  was  a  high  quality  CRT,  computer 
monitor.  On  this  display,  a  200  pixel  image  measured  3  inches.  The  color  display  was  operated  in  a  mode 
with  8  bit  quantization  of  the  video.  Typically,  subjects  viewed  this  display  from  about  18  inches;  however, 
the  viewing  distance  was  not  constrained  with  a  chin  rest  or  bit-bar.  A  high  resolution,  black  and  white 
display  was  also  used.  This  display  provided  10  bit  quantization  of  the  output  video.  On  this  display,  591 
pixels  measured  3.25  inches.  Typically,  subjects  viewed  the  black  and  white  display  from  about  15  inches. 
The  viewing  distance  was  not  constrained  except  in  Experiment  25.  In  all  of  the  experiments,  the  average 
display  luminance  was  five  fL.  Gamma  correction  was  not  used  for  either  display.  The  estimated  MTF  for 
the  color  display  and  the  measured  MTF  for  the  black/white  display  are  shown  in  Table  Al. 


Table  Al  Display  MTF 


Spatial  frequency  cycles/milliradian 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.8 

1.0 

1.2 

1.4 

Black/white  horizontal  MTF 

.98 

.94 

.86 

.77 

.66 

.56 

.35 

.20 

.10 

.04 

Black/white  vertical  MTF 

.98 

.92 

.83 

.71 

.59 

.47 

.26 

.12 

.05 

.02 

Color  horizontal  MTF 

.96 

.85 

.69 

.52 

.36 

.22 

.06 

.01 

Color  vertical  MTF 

.97 

.89 

.77 

.63 

.48 

.35 

.15 

.05 

.02 

A.l  Target  Acquisition  Task 

All  of  the  experiments  reported  here  involved  target  identification  (ID).  In  these  experiments,  the  observers 
were  trained  to  identify  twelve  targets.  Images  of  the  targets  were  then  degraded  by  blurring,  adding  noise, 
or  reducing  contrast,  and  the  observers  were  asked  to  identify  the  target  based  on  the  degraded  image. 

The  twelve  targets  are  shown  in  Figures  Al  and  A2.  Both  thermal  images  and  visible  images  were  used; 
only  examples  of  the  thermal  imagery  are  shown  here.  Figure  Al  shows  only  a  side  aspect  of  each  target, 
but  twelve  aspects  of  each  of  the  twelve  targets  were  used  during  the  experiments.  Figure  A2  shows  these 
aspects  for  the  T55  Russian  tank.  Pristine  image  sets  were  collected  in  both  the  thermal  and  visible  spectral 
bands.  The  size  of  the  images  was  401  by  590  pixels.  The  square  root  of  target  area,  averaged  over  all 
aspect  angles,  is  3. 1 1  meters.  The  target  range  during  imagery  collection  was  125  meters. 


Figure  A2.  Illustration  of  the  12  aspects  used  for  each  target 
shown  in  Figure  Al.  Target  shown  is  a  T55  Russian  tank. 


Figure  Al.  Illustration  of  the  12  targets 
used  for  experiments. 
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These  images  were  proeessed  to  generate  the  experiments.  In  all  of  the  experiments,  a  eell  of  24  images 
was  ereated  for  eaeh  eombination  of  MTF,  noise,  eontrast,  or  range.  Eaeh  eell  eontained  two  aspeets  of  all 
twelve  targets.  Eaeh  of  the  aspeets  shown  in  Figure  6  was  represented  twiee  in  eaeh  eell.  This  aspeet 
distribution  was  ehosen  to  make  task  diffieulty  as  uniform  as  possible  between  eells.  With  144  total  images 
and  24  images  per  eell,  up  to  six  blur  and  noise  eombinations  were  ereated  without  repeating  the  use  of  an 
original  image;  these  six  eells  ereated  a  “line”  in  the  experiment.  In  all  of  the  experiments,  eomparisons 
between  different  MTF  types  or  different  noise  or  eontrast  levels  used  the  same  original  images  to  ereate 
the  eells  to  be  eompared.  This  means  that  the  same  image  was  viewed  three  to  five  times  in  one  experiment. 
Sinee  the  experiments  eontained  between  432  and  720  images,  it  was  doubtful  that  subjeet  learning 
oeeurred  beeause  of  repeating  the  images.  Cell  presentation  was  random  based  on  a  pre-seleeted  order;  all 
subjeets  saw  the  images  in  the  same  order.  All  of  the  images  in  one  eell  were  presented  sequentially. 

The  subjeets  were  all  aetive  military  and  experieneed  in  the  use  of  thermal  imagers.  In  addition,  eaeh 
observer  was  trained  to  ID  the  taetieal  vehieles  used  in  these  tests.  All  observers  passed  a  pre-test  with  at 
least  95%  eorreet;  most  observers  passed  with  a  100%  seore.  The  average  number  of  subjeets  for  an 
individual  experiment  was  15  but  varied  from  a  minimum  of  9  to  a  maximum  of  23.  The  pereeption 
laboratory  was  moved  between  Army  bases  in  order  to  maintain  a  large  subjeet  pool. 

A.2  Description  of  Experiments 

A.2.1  Experiments  with  well-sampled  imagery 

Gaussian  (G),  exponential  (E),  reet,  and  Differenee  of  Gaussian  (DOG)  MTF  were  applied  to  the  images; 
these  MTF  types  are  illustrated  in  Figure  A3.  Noise  levels  varied  from  zero  to  levels  that  eompletely 
obseured  the  targets.  The  magnifieation  varied,  depending  on  the  experiment,  from  0.63  to  2.8. 
Experiments  were  also  performed  with  average  target  eontrast  ranging  from  0.013  to  0.205.  In  some  eases 
the  images  were  down-sampled,  noise  added,  and  then  the  images  were  “interpolated  up”  or  eleetronieally 
zoomed.  This  proeess  was  needed  to  inerease  the  impaet  of  the  noise  on  performanee.  Table  A2  provides 
details  for  the  MTF  and  white -noise  experiments.  In  the  table,  the  e  "  (0.043)  amplitude  is  used  to  define 
frequeney  eutoff  Frequeney  is  in  units  of  eycles  per  milliradian  in  objeet  spaee.  RMS  noise  is  in  units  of 
fL.  Noise  was  random  with  a  Gaussian  amplitude  distribution.  Noise  was  added  to  the  imagery  at  run  time; 
one  of  240  pre-stored  noise  frames  was  randomly  seleeted  and  added  to  the  target  image  at  the  60  Hertz 
display  rate.  None  of  the  imagery  in  these  experiments  exhibited  sampling  artifaets.  Pre-  and  post-filtering 
was  always  sufficient  that  no  sampling  artifacts  were  visible. 


Figure  A. 3  Various  types  of  MTF  are  shown.  The 
dotted  line  indicates  the  0.05  value  used  to  indicate 
frequency  cutoff  in  the  table.  Frequency  in  cycles 
per  milliradian  at  the  eye. 


Table  A2.  MTF,  noise,  and  contrast  experiments 


Experiment 

MTF 

Type 

MTF  0.043  cutoff 
object  space  cy/miiiirad 

Contrast 

Down- 

sampie 

RMS 

noise 

zoom 

Dispiay  & 
Magnification 

6a  line  1 

G 

.11, .14, .17, .23, .34, .68 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

6a  line  2 

. E . 

.18,.21,.27,.36,.54,1.1 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

6b  line  1 

G 

.11,.14,.17,.23,.34,.68 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

6b  line  2 

E 

.11,.14,.17,.23,.34,.69 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

6b  line  3 

E 

.1,.12,.14,.19,.29,.57 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

9  line  1 

G 

.18,.21,.26,.35,.53,1.05 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

9  line  2 

E 

.07,.08,.1,.13,.2,.41 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

9  line  3 

DOG 

.14,.19,.23,.31,.46,.91 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

95 


9  line  4 

Rect 

.23, .28, .34, .46, .71, 1.4 

0.205 

By  2 

0.0 

No 

Color  (2.85) 

13  line  1 

G 

.23,.27,.34,.46,.68,1.4 

0.205 

By  8 

0 

By  2 
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A.2.2  Experiments  with  sampled  imagery 

Sampling  experiments  were  performed  to  show  that  the  new  metrie  works  when  a  half-sample  eutoff  is 
imposed.  That  is,  the  TTP  metrie  bases  image  quality  on  the  un-eorrupted  frequeney  speetrum.  Current 
models  using  the  Johnson  eriteria  eannot  impose  the  half-sample  eutoff,  sinee  this  results  in  pessimistie 
performanee  predietions.  In  the  sampling  experiments,  the  blur,  sampling,  and  display  size  were  varied  to 
represent  the  effeet  of  inereasing  range. 

Two  experiments  were  performed.  Experiment  25  examined  the  impaet  on  range  performanee  of  different 
display  interpolations.  Visible  display-pixel  structure,  like  line  raster  or  the  edges  of  square  pixels,  tends  to 
mask  the  underlying  image  and  decrease  range  performance.  Visible  pixel  structure  is  minimized  by  a  good 
display  interpolation  which  fdters  out  spectrum  beyond  the  half-sample  frequency.  In  Experiment  25, 
aliased  signal  at  less  than  the  half-sample  frequency  was  minimized.  Experiment  36  was  performed  to 
explore  the  impact  of  large  amounts  of  in-band  aliasing  on  targeting  performance.  A  small  detector  fdl- 
factor  was  used  to  generate  aliased  signal  at  frequencies  less  than  the  half-sample  rate. 

The  sensor  simulated  in  Experiment  25  had  the  following  characteristics.  The  mid-wave  IR,  staring  focal 
plane  array  had  256  by  256  detectors.  The  active  detector  area  was  28  microns  on  a  30  micron  pitch.  The 
sensor  field  of  view  was  2  by  2  degrees.  The  F/2  optics  had  a  22  centimeter  focal  length.  The  simulated 
ranges  were  0.54  to  3.24  kilometers  in  0.54  kilometer  increments. 

The  imagery  was  displayed  on  the  black  and  white  monitor.  Experiment  25  consisted  of  six  lines  each  with 
six  ranges  (cells)  with  24  target  calls  per  cell.  Each  line  used  different  interpolations  to  increase  image  size 
(electronically  zoom  the  image);  this  changed  the  character  is  the  displayed  image  by  adding  variable 
amounts  of  pixel  structure.  The  display  interpolations  for  each  line  are  shown  in  Table  A3.  The  kernel 
shown  in  Equation  (A-1)  provided  the  least  amount  of  display  structure;  this  kernel  provides  a  good  fdter  at 
the  half-sample  rate. 

Kernel  =  [.011  0  -.089  0  .58  0  .58  0  -.089  0  .011]  (A-1) 

Experiment  36  was  performed  to  explore  the  impact  of  large  amounts  of  in-band  aliasing  on  targeting 
performance.  Again,  a  256  by  256  focal  plane  array  was  used.  In  this  experiment,  the  detector  pitch  was  25 
microns.  The  F/2  optics  had  a  7.33  centimeter  focal  length.  Imagery  was  displayed  on  the  black  and  white 
monitor.  Simulated  ranges  were  0.43,  0.64,  0.97,  1.3,  1.6,  and  2.15  kilometers. 

Various  amounts  and  types  of  aliasing  were  created  by  changing  detector  active  area  (detector  fdl  factor) 
and  display  technique.  In-band  aliasing  was  varied  by  changing  the  detector  fdl  factor.  Low  in-band 
aliasing  resulted  from  setting  the  detector  active  area  to  25  microns  (100%  fdl  factor).  High  in-band 
aliasing  resulted  from  setting  the  detective  active  area  to  1  micron  (fdl  factor  of  1/25  in  both  directions). 
Because  the  small  detector  fdl-factor  was  associated  with  a  high  MTF,  the  MTF  of  the  sensor  used  to 
collect  the  pristine  images  was  significant  and  was  modeled  in  this  experiment. 
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Sensor  MTF  =  frequency^ 

where  frequeney  is  in  objeet  spaee  and  has  units  of  eyeles  per  milliradian. 

To  ehange  out-of-band  aliasing  (visibility  of  pixels),  different  display  interpolations  were  used  in 
Experiment  36  also;  these  are  shown  in  Table  A3.  In  all  eases,  sensor  imagery  was  e-zoomed  by  1 1  in  both 
horizontal  and  vertieal.  Low  out-of-band  abasing  resulted  from  using  the  MATLAB  bieubie  image  resize 
funetion  to  resize  by  eleven.  The  bieubie  interpolation  filtered  out-of-band  aliasing;  no  raster  or  pixel 
effeets  were  visible.  High  out-of-band  aliasing  was  ereated  by  using  pixel  replieate  to  e-zoom  by  eleven.  In 
this  ease,  the  pixels  were  readily  visible.  The  experiment  eonsisted  of  four  lines:  (1)  no  in-band  and  no  out- 
of-band  aliasing,  (2)  no  in-band  with  out-of-band,  (3)  in-band  but  no  out-of-band,  and  (4)  both  in-band  and 
out-of-band  abasing.  The  deerease  in  Nresolved  due  to  sampling  artifaets  is  shown  in  Table  A3  as 
“sampling  faetor.” 


Table  A3  Display  interpolations  for  sampling  experiment. 


Experiment 
&  line 

interpolate 

2nd 

interpolate 

3rd 

interpolate 

Total 

E- 

zoom 

System 

magnification 

Detector 

fill- 

factor 

Sampling 

factor 

25  line  1 

None 

Replicate 

Replicate 

4 

9 

Large 

0.9 

25  line  2 

None 

Bilinear 

Replicate 

4 

9 

Large 

0.94 

25  line  3 

None 

Equation 

(23) 

Replicate 

4 

9 

Large 

0.97 

25  line  4 

Replicate 

Replicate 

Replicate 

8 

18 

Large 

0.82 

25  line  5 

Bilinear 

Bilinear 

Replicate 

8 

18 

Large 

0.93 

25  line  6 

Equation 

(23) 

Equation 

(23) 

Replicate 

8 

18 

Large 

0.97 

36  line  1 

None 

None 

Bicubic 

11 

10.6 

Large 

0.96 

36  line  2 

None 

None 

Replicate 

11 

10.6 

Large 

0.8 

36  line  3 

None 

None 

Bicubic 

11 

10.6 

Small 

0.73 

36  line  4 

None 

None 

Replicate 

11 

10.6 

Small 

0.62 

Viewing  distanee  is  a  problem  with  sampling  experiments.  In  terms  of  performanee  degradation,  the  most 
serious  type  of  sampling  artifaet  is  visible  pixel  stmeture.  That  is,  display  raster,  pixels  edges,  and  other 
periodie  modulation  beyond  the  half-sample  frequeney  prevents  the  visual  system  from  integrating  the 
underlying  pieture.  But  eye  MTF  is  a  signifieant  post  fdter.  When  sampling  artifaets  are  present,  the  image 
ean  generally  be  seen  better  by  moving  the  eye  position  away  from  the  display.  As  viewing  distanee 
inereases,  eye  MTF  filters  out  the  sampling  artifaets.  This  behavior  mins  the  experiment.  Unfortunately,  in 
our  faeility,  there  is  only  one  display  station  where  viewing  distanee  ean  be  strietly  eontrolled.  This  station 
was  used  for  Experiment  25.  The  subjeets  were  seated  in  a  reelined  ehair  sueh  that  the  display  viewing 
distanee  eould  be  eontrolled  at  18  inehes.  Viewing  distanee  was  not  as  well  eontrolled  for  Experiment  36. 
The  subjeets  were  plaeed  in  fixed-baek  ehairs  without  eoasters  and  warned  about  exeessive  head 
movement.  However,  the  subjeets  were  not  eontinually  ehallenged  to  maintain  head  position.  We  did 
observe  subjeets  sometimes  lean  baek  in  an  apparent  effort  to  better  discern  the  image. 

A.2.3  Experiments  with  Colored  Noise 

Two  experiments  with  colored  noise  were  performed;  the  experiments  were  identical  except  that  one  used 
thermal  images  and  the  other  used  visible  images.  The  contrast  of  the  visible  image  set  was  0.37;  the 
contrast  of  the  thermal  set  was  0.205.  Each  experiment  consisted  of  four  lines  of  six  blurs  each.  The  six 
blurs  were  created  with  a  Gaussian  kernel  with  e  "  object-space  frequency  cutoffs  of  0.2,  0.23,  0.27,  0.34, 
0.46,  and  0.69  cycles  per  milliradian.  Magnification  was  0.63,  so  that  frequency  cutoffs  at  the  eye  were 
0.32,  0.37,  0.43,  0.54,  0.73,  and  1.1.  The  images  were  blurred  and  then  down-sampled  by  four.  Frames  of 
static,  white  noise  were  filtered  and  then  added  to  the  down-sampled  images  for  display  on  the  black  and 
white  monitors.  The  first  line  had  no  noise  added,  the  second  line  had  white  noise  added,  the  third  line  had 
low  frequency  noise  added,  and  the  forth  line  had  high  frequency  noise  added.  The  MTF  of  the  noise  filters 
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PID 


are  shown  in  Figure  A4.  The  RMS  of  the  white  noise  before  filtering  was  0.98  fL  for  the  white  and  high 
frequeney  noise  lines.  Before  filtering,  the  RMS  of  the  low  frequeney  noise  was  18  fL. 


Figure  A4.  MTF  of  filters  used  to  color 
noise.  Spatial  frequency  is  in  object  space. 


^  High  freq  ^  Low  freq 


A.3  Experimental  Results 

A.3.1  Well-Sampled  Imagery 

The  results  of  the  MTF,  noise,  and  eontrast  experiments  deseribed  in  Seetion  A.2.1  are  shown  in  Figure  A5 
for  the  Johnson  eriteria  and  in  Figure  A6  for  the  TTP  metrie.  In  Figure  A6,  the  abseissa  is  Nresoived  based  on 
Equation  (_).  TTP  values  are  ealeulated  using  Equation  (_).  When  ealculating  the  model  predietions  for 
Figure  A5,  the  Johnson  frequeney  (Fj)  is  found  by  the  interseetion  of  target  eontrast  with  the  CTFsys 
function.  Equation  (_)  is  used  to  find  Njcresoived  for  the  Johnson  criteria  also,  but  Fj  is  used  rather  than  the 
TTP  value.  In  both  figures,  the  ordinate  is  probability  of  ID.  In  these  figures.  Experiments  6  and  9  data  with 
various  MTF  shapes  are  designated  with  a  diamond  symbol  (o).  Experiment  13  with  Gaussian  blur  and  low 
amounts  of  noise  is  designated  by  a  square  (□).  Gaussian  blur  with  large  amounts  of  noise  is  designated  by 
a  triangle  (a);  these  data  are  from  Experiment  20.  Experiment  19  data  representing  exponential  MTF  with 
large  amounts  of  noise  are  represented  by  asterisks  (*).  Gaussian  blur  on  the  high  resolution  display. 
Experiment  38,  is  shown  by  open  circles  (O).  The  low  contrast  experiment  data  from  Experiment  33  are 
shown  by  filled-in  circles  (•). 


Figure  AS.  Results  of  MTF,  coutrast,  aud  uoise  Figure  A6.  Results  of  MTF,  coutrast,  aud 

experimeuts  for  the  Johusou  criteria.  uoise  experimeuts  for  TTP  metric. 


The  model  curves  shown  in  each  figure  are  the  Target  Transfer  Probability  Function  (TTPF)  to  use  with 
each  metric.  The  TTPF  curves  are  logistics  functions  as  defined  by  Equation  (A-3).  For  the  Johnson  metric, 
Njcso  is  6.98.  For  the  TTP  metric,  N50  is  20.8. 
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PiD  =■ 

where  for  the  TTP  metric 


(A-3) 


N  -  A^re^o/vecf 
£■  =  1.33  +0.33^5q 

and  for  the  Johnson  criteria 

A  =  NjCresolved  “1-6 
£  =  1.33  +0.23^^^50- 

The  PiD  data  represent  the  average  number  of  correct  calls  for  all  observers  for  each  cell  of  24  target 
images.  The  experimental  data  are  corrected  in  two  ways.  First,  the  probability  of  chance  is  taken  out  of  the 
data.  That  is,  the  data  are  adjusted  to  remove  the  one  in  twelve  chance  that  the  subject  will  correctly  ID  the 
target  by  accident.  The  data  are  also  corrected  for  mistakes.  Experimentally,  the  ID  probabilities  asymptote 
to  0.9  rather  than  1;  there  is  a  10%  mistake  rate  that  does  not  correlate  to  cycles  on  target.  Equation  (A-6)  is 
used  to  correct  the  measured  data. 


(A-4) 

(A-5) 


PiD 


Pmeasured  Pchance 
0-9  -  Pchance 


(A-6) 


It  is  observed  that  some  subjects  do  approach  1.0  probability  with  good  imagery,  but  averages  over  a  group 
of  subjects  do  not.  The  subjects  are  trained  and  tested  before  the  experiment,  and  the  subjects  are  given  rest 
periods.  Prizes  are  awarded  for  the  best  performance,  and  this  appears  to  motivate  the  subjects.  Whether 
performance  would  improve  or  degrade  in  actual  combat  is  not  known.  Certainly  motivation  would 
increase.  However,  these  are  difficult  experiments,  and  it  would  seem  that  getting  nine  out  of  ten  calls 
correct  would  indicate  reasonable  motivation  on  the  part  of  the  subjects.  Whatever  the  source  of  these 
errors,  they  do  not  correlate  to  image  quality. 

As  seen  in  Figure  A-6,  the  TTP  metric  provides  an  excellent  fit  to  the  data.  The  new  metric  predicts 
accurately  for  various  shape  and  size  blurs,  good  and  poor  intrinsic  target  contrast,  and  various  levels  of 
noise.  The  average  error  is  0.046  and  the  maximum  error  is  0.21.  The  square  of  the  Pearson’s  correlation 
(PSQ)  is  0.94.  Also,  the  sampling  cutoff  applied  in  the  noise  experiments  does  not  affect  model  accuracy. 
Experiments  13,  19,  and  20  had  a  half-sample  frequency  of  0.42  cycles  per  milliradian.  The  image  content 
beyond  the  half-sample  frequency  was  mainly  aliased  content  and  represented  image  corruption.  To 
generate  Figure  9,  the  integral  for  the  TTP  metric  was  taken  from  0.0  to  0.42  cycles  per  milliradian.  The 
TTP  metric  was  not  affected  by  a  half-sample  frequency  cutoff 

The  Johnson  criteria  are  less  accurate.  In  Figure  A-5,  there  is  a  general  scatter  of  the  data.  The  PSQ  is  0.72, 
the  average  error  is  0.1,  and  the  maximum  error  is  0.32.  There  is  also  a  vertical  line  of  values  at  N  =  10.5 
and  again  at  N  =  22  which  result  from  limiting  to  the  half-sample  frequency.  Figure  A-7  shows  results 
for  the  Johnson  criteria  without  the  half-sample  limit.  Prediction  accuracy  improves  somewhat.  For  the 
experiments  shown,  the  average  error  without  the  frequency  limit  is  0.096  and  the  maximum  error  is  0.36. 
The  PSQ  is  0.75. 


Figure  A-7.  Johnson  predictions 
without  the  half-sample  frequency 
limit  imposed. 
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A.3.2  Results  of  sampled  imagery  experiments 

Nresoived  and  Njcresoived  are  decreased  by  an  amount  that  depends  on  the  sampling  artifacts  predicted  to  be 
present  in  the  image.  The  model  used  to  predict  the  amount  of  sampling  artifacts  is  described  by 
Vollmerhausen  (2000).  Figures  A-8  and  A-9  show  Experiment  25  results  and  model  predictions  for  the 
Johnson  criteria  and  the  TTP  metric,  respectively.  In  both  figures,  the  abscissa  is  Nresoived  (or  Njcresoived)  and 
the  ordinate  is  PID.  As  seen  in  Figure  A-8,  the  Johnson  criteria  predictions  are  consistently  pessimistic 
However,  as  seen  in  Figure  A-9,  the  TTP  metric  does  provide  a  good  fit  between  model  and  data  with  a 
PSQ  correlation  above  0.9.  Sampling  predictions  are  pessimistic  at  long  ranges  (low  metric  values).  This 
occurs  because  of  the  nature  of  the  sampling  correction.  The  correction  is  an  empirically  derived,  fractional 
decrease  in  range  performance.  As  the  target  gets  further  from  the  sensor  and  therefore  smaller  on  the 
display,  sampling  actually  has  a  greater  impact  on  performance.  However,  this  is  currently  not  modeled. 

Figures  A-10  and  A-11  show  results  from  Experiment  36.  Again,  the  Johnson  criteria  is  pessimistic.  The 
TTP  metric  accurately  predicts  performance  with  a  PSQ  of  0.93.  In  Figure  A-11,  TTP  model  predictions 
are  accurate  at  long  range  but  pessimistic  at  short  range;  this  is  the  opposite  of  the  Experiment  25  behavior 
shown  in  Figure  A-9.  Remember,  however,  that  the  subjects  moved  their  heads,  optimizing  performance  in 
a  way  not  predicted  by  the  model. 


Figure  A-8.  Experimeut  25  results  aud 
Johusou  criteria  model  predictious. 


Figure  A-10.  Experimeut  36  results  aud 
Johusou  criteria  model  predictious 


Figure  A-9.  Experimeut  25  results  aud 
TTP  model  predictious. 


Nresoived 

Figure  A-11.  Experimeut  36  results  aud  TTP 
model  predictious. 


A.3.3  Results  of  experimeuts  with  colored  uoise 

These  experiments  were  performed  to  illustrate  that  the  TTP  metric  can  be  used  to  predict  performance  in 
the  presence  of  colored  noise.  Figure  A- 12  shows  the  results  for  thermal  imagery  and  Figure  A- 13  shows 
results  for  visible  imagery.  The  N50  for  the  thermal  images  is  20.8,  and  the  N50  for  the  visible  images  is 
28.  The  TTP  model  fits  the  data  well;  the  PSQ  value  is  0.9  in  both  cases. 
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Johnson  criteria  results  are  shown  in  Figures  A- 14  and  A- 15  for  the  thermal  and  visible  images, 
respectively.  The  N50  for  the  visible  images  is  5  based  on  fitting  the  curve  to  the  no  noise  and  white  noise 
data.  There  are  systematic  errors,  particularly  for  the  low  frequency  noise. 


9 

CL 


Nresolved  Nresolved 


Figure  A-12.  TTP  metric  results  of  colored  Figure  A-13.  TTP  metric  results  of  colored 

uoise  experimeut  with  thermal  imagery.  uoise  experimeut  with  visible  imagery. 
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Figure  A-14.  Johusou  criteria  results  of  colored  Figure  A-15.  Johusou  criteria  results  of 

uoise  experimeut  with  thermal  imagery.  colored  uoise  experimeut  with  visible  imagery. 
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Appendix  B:  Experiments  with  Low  Contrast  and  Boost 

In  all  of  the  validation  experiments,  the  single  largest  error  assoeiated  with  TTP 
predietions  oeeurred  for  low  eontrast  (0.033),  no-noise  images  with  high-frequeney  boost 
applied.  It  initially  appeared  that  the  error  might  be  systematie,  so  further  evaluations 
were  performed.  The  results  of  those  evaluations  provide  some  insights  into  the  workings 
of  the  model  and  the  pitfalls  assoeiated  with  this  type  of  experimentation. 

Experiment  34  used  Gaussian  blurs  with  e’’^  MTF  eutoffs  in  object  space  of  0.2,  0.23, 
0.27,  0.34,  0.46,  and  0.69  cycles  per  milliradian.  This  was  an  ID  experiment  as  described 
in  Appendix  A.  The  black  and  white  display  was  used.  The  system  magnification  was 
0.63.  Since  the  images  were  minified  compared  to  object  space,  frequency  cutoff  at  the 
eye  is  proportionally  greater  than  the  cutoff  in  object  space.  The  experiment  consisted  of 
applying  the  six  Gaussian  blurs  to  the  590  by  401  pixel,  thermal  images.  Four  sets  of 
images  were  created,  two  with  contrast  of  0.11  and  two  with  contrast  of  0.033.  One 
image  set  at  each  contrast  had  high  frequency  boost  applied;  see  Figure  B.l.  No  noise 
was  added  to  the  imagery.  The  imagery  was  down-sampled  by  four  before  presentation. 


Figure  B,1  Plot  showing 
relationship  between  Gaussian 
blur,  applied  boost,  and  final 
“after  boost”  MTF, 


Figure  B.2  shows  the  TTP  predictions  compared  to  the  observer  data.  The  data  have  been 
corrected  for  chance  (0.083  probability)  and  for  mistakes  (0.1  probability).  The  largest 
error  is  for  low  blur,  low  contrast,  with  boost.  There  appears  to  be  a  systematic  error  for 
low  contrast  predictions,  particularly  with  boost  applied.  The  data  is  re-plotted  in  Figures 
B.3  and  B.4.  These  figures  show  that  the  performance  improvement  due  to  boost  is 
predicted  well,  but  absolute  predictions  are  pessimistic  for  low  contrast  when  the  blur  is 
small. 
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Figure  B.2  Results  of 

Experiment  34  showing  model 

and  observer  data.  q  0.6  - h 
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Figure  B.3  Re-plot  of  the  0.11 
contrast  data  and  model.  The 
“frequency  cutoff  number” 
refers  to  blur  size,  with  small 
blur  at  cutoff  1. 
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Figure  B.4  Re-plot  of  the  0.033 
contrast  data  and  model.  The 
“frequency  cutoff  number” 
refers  to  blur  size,  with  small 
blur  at  cutoff  1. 
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Similar  results  were  obtained  for  Experiment  33;  details  for  this  experiment  are  deseribed 
in  Appendix  A.  This  experiment  used  Gaussian  blur  and  explored  the  effeet  of  changing 
contrast.  Contrasts  ranged  from  0.11  to  0.018.  The  data  are  plotted  in  Figure  B.5  Again, 
the  model  is  somewhat  pessimistic  for  the  smaller  blurs.  In  this  case,  boost  was  not  used. 


Figure  B.5  Results  of 
Experiment  33  showing  model 
TTPF  and  observer  data  for 
0.11,  0.06,  0.033,  and  0.018 
contrast  target  sets.  The  six  data 
points  for  each  contrast  are 
different  Gaussian  blurs. 
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When  the  images  from  Experiments  33  and  34  were  evaluated,  it  was  determined  that  the 
cues  needed  to  ID  the  targets  were  available  in  the  small  blur  image  sets.  The  data  errors 
are  not  statistical.  Remember  that  these  images  were  not  corrupted  by  noise,  and  that  the 
display  used  had  unusual  contrast  dynamic  range  (10  bit).  The  images  associated  with 
high  observer  probability  did  have  good  target  cues. 

To  compare  performance  as  contrast  changes,  the  same  target  set  is  used  as  contrast  as 
changed.  In  order  to  avoid  many  repetitions  of  showing  the  same  image  set,  a  different 
image  set  is  used  for  each  blur  (each  frequency  cutoff).  Experiment  39  was  run  to 
determine  whether  the  model  errors  can  be  explained  by  a  change  in  task  difficulty.  That 
is,  if  the  model  error  is  systematic,  then  changing  the  order  in  which  blur  is  applied  to  the 
experiment  cells  should  not  affect  the  results.  In  Experiment  39,  the  cells  which  in 
Experiments  33  and  34  had  small  blur  were  given  large  blur. 
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The  results  of  experiment  39  are  shown  in  Figure  B.6.  The  model  pessimism  did 
disappear.  In  Figure  B.7,  the  results  of  Experiments  33  and  39  for  the  same  eontrast  and 
blur  are  averaged.  No  attempt  was  made  to  establish  the  most  diffieult  target  set  and 
average  that  with  the  easiest  set;  the  matehing  oeeurred  by  ehanee.  Clearly,  a  systematie 
model  error  does  not  exist. 


1 

Figure  B.6  Results  of  Experiment  0.8 

39  showing  model  TTPF  and 
observer  data  for  0.11,  0.06,  Q  0.6 
0.033,  and  0.018  contrast  target  g  4 
sets.  In  this  experiment,  images 
which  previously  had  large  blur  0.2 
now  had  small  blur. 
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Figure  B.7  Average  of  observer 
results  from  Experiments  33  and 
39.  Averaging  task  difficulty  by 
mixing  the  targets  viewed  at  each 
blur  and  contrast  makes  the 
model  more  accurate. 


This  evaluation  points  up  two  problems.  First,  target  aequisition  eannot  really  be 
predicted  until  we  ean  prediet  task  diffieulty.  The  target  is  not  in  the  model;  we  model 
image  quality.  Seeond,  beeause  human  observers  learn  quiekly,  the  same  target  image  set 
eannot  be  used  over  and  over.  But  eomparing  performanee  based  on  different  target 
groupings  leads  to  errors  beeause  of  the  ehange  in  task  diffieulty. 
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Appendix  C:  Recognition  Experiment 

The  experiments  used  to  develop  the  TTP  metrie  used  the  ID  task.  The  task  was  kept 
eonsistent  so  that  the  model  did  not  change  between  experiments.  With  a  fixed  and 
known  N50,  the  TTPF  model  curve  is  known  and  fixed.  That  same  model  curve  is  then 
compared  to  the  results  of  numerous  experiments,  showing  that  the  model  can  predict  the 
impact  of  changing  blur  shape  and  size,  noise,  contrast,  and  sampling. 

The  primary  reason  for  performing  a  recognition  experiment  is  to  verify  that  the  sampling 
adjustments  are  applicable  to  an  easier  target  acquisition  task  than  ID.  The  recognition 
experiment  is  also  a  further  check  on  the  TTP  metric. 

Previously,  target  recognition  involved  discriminating  between  tanks,  trucks,  and 
armored  personnel  carriers  (APC).  An  N50  of  3  for  the  Johnson  criteria  and  14.5  for  the 
TTP  metric  is  associated  with  this  type  of  recognition  task.  However,  trucks  are  much 
easier  to  discriminate  from  tanks  or  APC  than  APC  from  tanks.  This  recognition  task  is  a 
mixture  of  easy  and  hard  discriminations,  and  does  not  constitute  a  good  target 
acquisition  experiment  for  model  validation. 

Devitt  (2001)  describes  a  new  recognition  set  consisting  of  tracked-armored,  wheeled- 
armored,  and  wheeled-truck.  She  demonstrated  that  the  three  classes  were  equally 
difficult  to  discriminate.  Further,  this  new  recognition  task  has  operational  significance 
because  wheeled  combat  vehicles  are  becoming  more  common.  The  target  set  used  for 
this  experiment  is  shown  in  Table  C.l.  Figure  C.l  illustrates  the  three  types  of  vehicles. 


Figure  C.l  Recognition  Tracked-armored/Wheeled-armored/Wheeled-truck 
Experiment  involved  many  vehicles  and  aspects;  these  are  examples. 


The  Johnson  N50  for  the  new  recognition  task  is  3.5  (Devitt,  2001);  the  TTP  N50  is 
therefore  16.9.  The  conversion  between  N50  values  is  discussed  in  Section  6.  The  square 
root  of  target  area  averaged  over  all  targets  and  aspects  is  2.93  meters.  Average  target 
contrast  is  4.1  K. 

A  256  by  256  focal  plane  array  was  used  for  this  experiment.  The  detector  pitch  was  25 
microns.  The  F/2  optics  had  a  7.33  centimeter  focal  length.  Imagery  was  displayed  on  the 
black  and  white  monitor.  Simulated  ranges  were  0.43,  0.64,  0.97,  1.3,  1.6,  and  2.15 
kilometers.  Various  amounts  and  types  of  aliasing  were  created  by  changing  detector 
active  area  (detector  fill  factor)  and  display  technique.  In-band  aliasing  was  varied  by 
changing  the  detector  fill  factor.  Low  in-band  aliasing  resulted  from  setting  the  detector 
active  area  to  25  microns  (100%  fill  factor).  High  in-band  aliasing  resulted  from  setting 
the  detective  active  area  to  1  micron  (fill  factor  of  1/25  in  both  directions). 
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To  change  out-of-band  aliasing  (visibility  of  pixels),  different  display  interpolations  were 
used  in  the  experiment;  these  are  shown  in  Table  C.2.  In  all  eases,  sensor  imagery  was  e- 
zoomed  by  11  in  both  horizontal  and  vertieal.  Low  out-of-band  aliasing  resulted  from 
using  the  MATLAB  bieubie  image  resize  funetion  to  resize  by  eleven.  The  bieubic 
interpolation  filtered  out-of-band  aliasing;  no  raster  or  pixel  effeets  were  visible.  High 
out-of-band  aliasing  was  ereated  by  using  pixel  replieate  to  e-zoom  by  eleven.  In  this 
ease,  the  pixels  were  readily  visible.  The  experiment  eonsisted  of  four  lines:  (1)  no  in- 
band  and  no  out-of-band  aliasing,  (2)  no  in-band  with  out-of-band,  (3)  in-band  but  no 
out-of-band,  and  (4)  both  in-band  and  out-of-band  aliasing. 

Nresoived  is  dooreasod  by  an  amount  that  depends  on  the  sampling  artifaets  predieted  to  be 
present  in  the  image;  this  is  diseussed  in  Seetion  7.  The  amount  that  Nresoived  is  deereased 
for  eaeh  line  of  the  experiment  is  shown  as  the  “sampling  faetor”  in  Table  C.2. 

Figure  C.2  shows  the  observer  data  and  the  model  predietions.  The  fit  between  model  and 
data  is  exeellent.  Both  the  TTP  metrie  and  the  adjustment  for  sampling  artifaets  are 
applieable  to  the  reeognition  task. 


line  2 
line  3 
line  4 

model  TTPF 


Figure  C.2  Model  (solid  line)  and  observer  data 
for  recognition  sampling  experiment. 
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Table  C.l  Vehicles  and  aspects  used  in  the  recoenition  experiment 


TRACKED 

WHEELED  TRUCK  (SOFT) 

Target 

Aspect 

Target 

Aspect 

2S1 

3NE,  7NG 

HEMMT 

ONG,  2NG,  4NG, 

2S3 

ONG,  7NG, 

6NG 

ACRV 

2NE,  6NG 

M35 

ONG,  2NG,  4NG, 

AVLB 

IDE 

6NG 

BMP-1 

6NG,  7NE 

ASTROS 

3DG,  4NE,  5NE, 

Ml  064 

2NG 

IDG,  6NG 

M109A5 

IDG 

FMTV/Lt 

3NE,  5NE,  7NE 

M109A6 

3DE,  5NG 

FMTV/Md 

ONG,  2NG 

M113 

4NG 

FROG-7 

IDE,  4DG,  5DG, 

MlAl 

ONE 

IDG,  6DE 

MllP 

4NG,  5DE 

GAZ-66 

ONG,  3NE,  2NG, 

M2 

2NE,  3NE 

5NE,  7NE 

M48 

IDG 

GRAD-1 

4DG,  5DG,  IDG, 

M548 

2NE,  3DG 

6NE,  7NE 

M551 

4NE,  INE 

HMMWV 

ONG,  3NG,  2NE, 

M577 

ONE,  7DG 

7NG,  INE 

M578 

5DE 

HMMWV 

6NE,  7NE,  IDG, 

M60A3 

ONG,  7NE 

-TOW 

4DE,  5NG 

M728 

ODE,  6DG 

STYX 

4NG,  5NE,  IDE, 

M88 

5DG,  7DE 

6DG,  3NE 

M901 

4DE,  5DE 

M992 

4NE 

MTLB 

3NG,  6NE 

T-55 

2NG,  6NG 

T-62 

4DG,  INE 

T-72 

3NE,  6NG 

T-72  (Reac) 

ING 

ZSU-23/4 

ONG,  2NE 

M41 

5DE 

WHEELED  ARMORED 


Target 


Aspect 


4NE,  5NG, 
6DG,  ONE, 
ONE,  3NE. 
4NG,  6NE 
4NE,  5NG, 
6DG,  ING, 
ODG,  3DE, 
ODG ,  5DE. 
ODG,  IDE, 
4NE,  5NG, 
4DG,  3DG, 
4DE,  5NG, 
ONG,  3DE, 
2DE,  5NE 


1  DG, 

2NG 

5NE, 

IDG, 

3NG,  7NG 
6DG,  7NG 
ING,  6NE 
2DE,  7DG 
2DG,  6DG 
2NG,  7DE 
7DG,  3DE 
2NE,  7DG, 


BRDM-2 

BRDM-2  AT 

BTR-70 

LAV-25 
LAV-AD 
LAV-AT 
LAV-CC 
LAV-M 
LAV-Rc 
M-93A1 

ASPECT  KEY 
FIRST  CHARACTER 

0=  front,  IMeft  front,  2Meft  flank, 
3=left  rear,  4=rear,  5=right  rear 
6=right  flank,  7=right  front 

SECOND  CHARACTER 
N=  night  8-12  micron  thermal 
D=  day  8-12  micron  thermal 

THIRD  CHARACTER 

G  =  0  elevation 
E  =  7  degree  elevation 


Table  C.2  Interpolations,  fill  factors,  and  Nresoived 
sampling  factor  for  each  experiment  line 


line 

interpolate 

E-zoom 

System 

magnification 

Detector 

fill-factor 

Sampling 

factor 

1 

Bicubic 

11 

10.6 

Large 

0.96 

2 

Replicate 

11 

10.6 

Large 

0.8 

3 

Bicubic 

11 

10.6 

Small 

0.73 

4 

Replicate 

11 

10.6 

Small 

0.62 
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Appendix  D.  ID  Performance  with  speckle  imagery 


The  ability  of  the  TTP  metric  to  predict  performance  of  humans  viewing  images  produced  by  laser  range 
gated  (LRG)  imagers  was  investigated.  A  perception  experiiuent  was  designed  and  an  image  set  was 
simulated.  The  simulated  sensor  was  modeled  on  current  electron  bombarded  CCD  (EBCCD)  technologies. 
Because  the  phenomenology  of  laser  range  gated  sensors  is  a  luix  of  coherent  and  incoherent  effects,  both 
types  of  processes  had  to  be  represented  in  the  imaging  chain.  Figure  D.l  shows  where  in  a  representative 
imager  the  transfer  characteristics  of  the  sensor  are  calculated  as  field  intensity  (coherent)  or  power 
(incoherent). 


Electric  Field  Intensity  Transfer 


^ower  Trans^r 


^BCCD 


Collection  Optics 


Figure  D.l  Imaging  chain  in  LRG  sensor 

To  simulate  the  coherent  portion  of  the  imaging  chain,  the  field  from  the  object  was  propagated  through  the 
collection  optics  and  imaged  onto  the  image  plane  of  the  camera.  Taking  the  square  root  of  the  gray  level 
in  a  panchromatic  image  of  the  target  simulated  the  amplitude  of  the  coherent  field  incident  on  the  sensor 
aperture.  The  phase  of  each  pixel  in  the  object  was  chosen  from  a  uniform  distributed  random  variable  over 
the  interval  [0,27i)  since  the  target  was  considered  to  be  rough  compared  to  the  wavelength  of  the  laser 
illumination.  For  each  point  on  the  source  object,  a  coherent  impulse  response  (blur)  was  created  in  the 
image  plane.  Since  the  source  points  had  random  phases,  the  resulting  image  was  formed  through  the 
interference  of  a  number  of  impulse  responses  at  the  image  plane.  The  resulting  output  field  at  the  image 
plane  was  the  complex  input  (electric  field)  convolved  with  the  coherent  impulse  response.  The  field  was 
then  converted  to  irradiance  by  squaring  the  magnitude  of  the  field  at  the  focal  plane.  All  other  blurs  in  the 
sensor,  including  the  electron  proximity  focus  and  detectors  of  the  EBCCD,  were  linear  with  respect  to  the 
irradiance  and  were  applied  as  point  spread  functions  in  power. 

The  characteristics  of  the  simulated  sensor  are  given  in  Table  D.  1 .  Using  these  characteristics,  the  coherent 
and  incoherent  impulse  response  functions  for  the  optics  and  detectors  were  created. 


Table  D.4.  Simulated  sensor  parameters. 


Parameter 

Value 

Wavelength 

1.57  microns 

Aperture  diameter  (centrally  obscured) 

125  mm 

Aperture  obscuration  fraction 

.3125 

Focal  length 

1250  mm 

Detector  size 

13  microns  square 
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The  coherent  impulse  response  or  point  spread  function  (PSF)  of  the  optics  was  calculated  using  the 
following  equation 


P^^Optics 


(  r  ^ 

(  r  \ 

somb 

1  r 

) 

-s^somb 

1  r 

1  s - 

l  ^0  y 

(D.l) 


where  r  is  an  angular  subtense  measured  from  the  sensor  and  Qq  is  the  ratio  of  the  aperture  diameter  to  the 
focal  length.  The  incoherent  impulse  response  of  the  electron  proximity  focus  was  implemented  as  a  filter 
in  the  spatial  frequency  domain.  The  filter  function  was  modeled  by  fitting  a  supergaussian  to  measured 
MTF  data.  The  resulting  MTF  is  given  by 


(D.2) 


where  y  was  found  to  be  1.64  and  P  was  found  to  be  25.5  cycles  per  milliradian  of  angle  measured  from  the 
sensor.  The  incoherent  detector  PSF  was  calculated  using 


PSFoe,{r)  =  rect 


ywj 
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where  w  is  the  angular  subtense  of  a  detector  measured  at  the  sensor.  After  application  of  all  blurs,  the 
imagery  was  downsampled  by  a  factor  of  two,  which  resulted  in  imagery  having  295  horizontal  pixels  and 
200  vertical  pixels. 

The  system  described  above  was  simulated  under  four  conditions.  The  first  condition  was  incoherent 
(spatial  and  temporal  incoherence).  Under  this  condition,  no  random  phase  was  applied  to  the  pristine 
image.  The  second  condition  was  a  single  shot  LRG-SWIR  mode  where  temporal  coherence  was 
maintained  and  spatial  phase  was  randomized,  thus  creating  a  speckle  image.  The  third  condition  was  a 
two-pulsed  average  image  and  the  third  condition  was  an  eight-pulsed  average  image.  The  averaging 
decreased  the  effect  of  the  laser  speckle. 

The  sensor  simulation  was  applied  to  576  images  that  were  presented  to  U.S.  Army  soldiers  as  part  of  a 
perception  experiment.  The  primary  purpose  of  the  experiment  was  to  determine  the  impact  of  speckle  on 
target  identification  performance.  The  standard  NVESD  identification  target  set  is  shown  in  Figure  D.2. 
“Probability  of  identification”  (PID)  was  established  by  NVESD  as  the  ability  of  an  observer  to  identify 
one  of  these  targets  from  the  other  eleven  targets.  The  target  set  included  these  12  targets  with  12  aspects 
resulting  in  144  pristine  images  that  were  processed  four  different  ways  to  produce  the  576  perception  test 
images.  The  targets  were  chosen  for  their  relative  confusability  and  tactical  significance.  The  left  flank  of 
each  vehicle  in  the  visible  target  set  is  shown  in  Figure  D.2.  The  visible  images  were  collected  using 
35mm  cameras  with  color  film  at  a  range  of  25  meters.  The  film  was  digitized,  converted  to  grayscale,  and 
processed  to  have  a  resolution  of  1.8  cm  per  sample  in  both  horizontal  and  vertical  directions.  The  images 
were  used  as  the  source  power  that  was  converted  to  an  electric  field  in  the  simulation.  Examples  of  the 
simulated  images  are  shown  in  Figure  D.3. 
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2S3  BMP  Ml  09  Ml  13 


MIA  M2  '''  ■  M551  M60 


Figure  D.2.  Target  set  used  iu  perceptiou  experimeut. 


Figure  D.3.  Simulated  speckle  images  at  5km.  1.  lucohereut.  2.  Cohereut  -  uo  averagiug.  3. 
Cohereut  -  2  speckle  average.  4.  Cohereut  -  8  speckle  average. 


no 


The  perception  experimental  design  is  outlined  in  Table  D.2.  From  the  pristine  image  sets  of  Figure  D.2, 
12  targets  with  12  aspects  (144  images  total)  were  distributed  evenly  across  the  columns  of  the  table 
shown.  This  distribution  resulted  in  only  two  of  the  same  target  image  in  each  column  and  two  of  the  same 
aspect  in  each  column  yielding  24  pristine  images  associated  with  each  column.  The  24  images  were 
processed  with  a  prescribed  range.  In  the  column  labeled  “1,”  24  pristine  images  were  blurred  to 
correspond  to  a  1-km  range  through  an  incoherent  sensor  model  and  placed  in  cell  “AA.”  Also,  24  pristine 
visible  images  (the  same  targets  and  aspects  as  those  in  AA)  were  processed  with  a  1-km  range  through  a 
single  laser  range-gate  shot  sensor  model  and  placed  in  cell  “AB”  and  so  on. 

Table  D.2.  Experiment  Design 


Range(km) 

1 

3 

5 

7.5 

10 

15 

Incoherent 

AA 

BA 

CA 

DA 

EA 

FA 

Single  Shot 

AB 

BB 

CB 

DB 

EB 

FB 

Two  Shot 
Average 

AC 

BC 

CC 

DC 

EC 

EC 

Eight  Shot 
Average 

AD 

BD 

CD 

DD 

ED 

ED 

The  standard  NVESD  target  set  was  processed  in  a  manner  depicted  in  Table  D.2.  A  comparison  of 
speckle  images  is  shown  in  Figure  D.3.  The  images  shown  were  simulated  at  5  km  with  an  incoherent 
process,  a  single-shot  laser  pulse,  a  2-shot  image  average,  and  an  8-shot  image  average.  Image  averages 
were  obtained  by  adding  independent  speckle  images.  The  incoherent  image  was  taken  as  a  baseline  for 
imagery  comparison.  It  can  be  easily  seen  that  the  speckle  from  the  single-shot  image  can  degrade  the 
identification  performance  significantly. 

Fifteen  soldiers  were  trained  to  identify  targets  with  99%  proficiency  prior  to  participating  in  the 
experiment.  The  experimental  cells  were  randomized  to  vary  the  level  of  target  identification  difficulty. 
The  images  were  displayed  on  high-resolution  grayscale  monitors.  The  monitors  have  Gaussian  MTFs  with 
equivalent  spot  sizes  of  0.306  mm  horizontally  and  0.237  mm  vertically.  There  were  approximately  70.9 
pixels  per  centimeter  on  the  displays.  The  images  were  displayed  with  an  average  luminance  of  5  fL  and 
were  viewed  from  a  nominal  distance  of  15  inches.  After  correcting  for  chance  guesses  and  mistakes,  the 
average  probability  of  ID  for  each  cell  in  Table  D.2  is  given  in  Table  D.3.  The  probability  of  a  chance 
guess  was  1  in  12  and  the  probability  of  mistake  was  1  in  10. 
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Table  D.3.  Percent  Correct  Identification 


A 

B 

C 

D 

E 

F 

A 

96.6% 

99.5% 

81.2% 

83.3% 

87.7% 

72.4% 

B 

99.0% 

67.7% 

26.2% 

15.8% 

10.9% 

1 .6% 

C 

99.7% 

71.0% 

40.1% 

21.3% 

17.9% 

5.1% 

D 

99.3% 

83.7% 

64.7% 

45.9% 

32.7% 

20.2% 

In  order  to  predict  the  performance  for  each  cell  in  the  perception  experiment,  the  contrast  threshold 
function  (CTF)  of  the  system  (including  the  observer)  had  to  be  computed.  The  system  CTF  would  then  be 
used  to  compute  the  target  task  performance  (TTP)  metric  from  which  the  probability  of  identification 
(PID)  could  be  derived.  To  simplify  these  performance  predictions,  it  was  assumed  that  all  the  spatial 
distortions  of  the  sensor  could  be  applied  incoherently  and  that  speckle  acted  as  additional  display  noise. 

In  computing  the  system  CTF,  distinctions  must  be  made  in  the  definition  of  spatial  frequency.  This  is  due 
to  the  different  ways  of  measuring  angles  in  the  system.  From  a  sensor  point  of  view,  angles  can  be 
determined  by  the  size  of  the  target  and  the  range.  From  the  observer’s  point  of  view,  angles  are  determined 
by  the  size  of  the  target  on  the  display  and  the  observation  distance.  The  two  angles  are  related  by  the 
magnification  of  the  system.  The  relationship  between  the  two  spatial  frequencies  is  given  by 

(D-4) 


where  ft  is  spatial  frequency  measured  at  the  sensor,  M  is  the  magnification,  and  fe  is  spatial  frequency 
measured  at  the  eye.  In  this  experiment,  all  images  were  displayed  at  the  same  size  with  the  blur  from  the 
system  increasing  with  increasing  range.  Therefore,  for  the  image  sizes  and  display  characteristics  used  in 
this  experiment,  the  relation  between  range  and  magnification  was  given  by 

M  =  0.01028  (D.5) 


where  R  is  the  range  in  meters. 

The  system  CTF  was  computed  in  spatial  frequency  measured  at  the  eye.  The  equation  used  for  the  system 
CTF  is  given  by 


CTFjf,)  [fTE 


(D.6) 


where  CTFeye  is  the  CTF  of  the  observer’s  unaided  eye,  MTFjys  is  the  system  MTF,  a  is  a  calibration 
constant  which  has  been  found  to  be  169.6,  o  is  the  standard  deviation  of  the  display  noise,  and  L  is  the 
average  display  luminance.  The  system  MTF  is  found  from 

MTF,„  (M  •  /, )  =  (M  •  /,  (M  •  /,  )MTF„  (M  -  /,  )MTF,„^  (/, )  (D.7) 


The  MTF  of  the  proximity  focus  (MTFprox)  is  given  by  equation  D.2.  The  incoherent  MTF  of  the  optics  is 
given  by 


where 


MTF„^,,Af) 


A  +  B  +  C 
l-s^ 


A  =  \ 


cos 


-1 


v/oy 


/ 


1- 


v/oy 


/^/o 

/>/o 


(D.8a) 


(D.8b) 
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(D.8d) 
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(D.12) 


cr; 


00 

=  I S.  (v|MrF„„  (v)MrF,,„  (v)MrF„,  (v)MrF,„  (v)MrF„,.  (vf  dv , 


and 


2 

cr„  = 


=  Js.(>’|MrF„„(v)MrF„„(v)MrF„,(v)MrF,„(v|’dv  ,  (D.13) 


Sn(v)  is  the  power  speetral  density  (PSD)  of  the  noise.  For  speckle,  the  PSD  is  found  from 


S,{^)  =  L- 


00 


(D.14) 


where  L  is  the  average  display  luminance. 


Two  new  MTFs  are  introduced  in  Equation  (D.12)  and  (D.13).  The  first  is  the  MTF  of  the  eye  which  is 
given  by 


/^TFJv)  =  MTF,,,  {y)MTF„,„  (v)MrF„„  (v)  (D.15) 


MTF, 


eye  _  optics 


where 

(v) =  exp 


/  \ 

I  av 

'(m) 

MTF,.^^.^{v)  =  QX^^ 


-0.375 


r 

'  V  ' 


\Mj 


-0.4441 

KM) 

The  values  of  a  and  b  in  equation  (D.  16)  are  found  from 

a  =  43.69[exp{3.663  -  0.0216i)J„„  ln(i)„„ 

and 


(D.16) 

(D.17) 

(D.18) 

(D.19) 


b  = 


0.7155  + 


0.277 


V 


pupil  J 


(D.20) 


with  Dpupii  being  the  diameter  of  the  pupil.  Pupil  diameter  is  dependent  upon  light  level  and  can  be 
calculated  from 


^F«F-/=-9-011  +  12.23exp^ 


21.082] 


{mm) 


(D.21) 


where  L  is  the  luminance  in  foot-Lamberts. 


114 


The  second  new  MTF  in  equation  D.  16  is  a  perceptual  filter  that  describes  the  spatial  frequency  response  of 
the  visual  cortex  channel  and  is  given  by 


2.2  log 


v/y 


(D.22) 


where /is  the  frequency  at  which  the  CTF  is  being  evaluated.  This  filter  is  only  applied  in  the  horizontal 
direction. 

Once  the  system  CTF  is  calculated  for  a  given  range,  the  TTP  metric  can  be  computed.  The  TTP  metric  is 
given  by 


TTP  = 


/2  ^ 

f 


(D.23) 


where  fl  and  f2  are  defined  as  the  points  where  the  value  Q  intersects  CTFsys-  The  average  size  of  a  target 
in  pixels  was  7021.  With  an  observation  distance  of  38.1  cm  and  approximately  70.9  pixels  per  cm  on  the 
display,  the  target  critical  dimension  (the  leading  term  in  equation  (D.23) )  was  3 1  mrad.  Using  this  value, 
equation  (D.23)  was  evaluated  for  every  cell  in  the  test  matrix.  The  results  are  shown  in  Table  D.4. 

Table  D.4.  TTP  Metric  Values  for  Perception  Experiment. 

A  B  C  D  E  F 


A 

193.29 

143.12 

106.16 

76.97 

58.50 

37.44 

B 

75.28 

31.97 

18.79 

10.98 

6.76 

2.09 

C 

89.86 

23.84 

23.84 

14.93 

10.06 

4.71 

D 

121.87 

59.19 

36.88 

23.98 

17.16 

10.03 

The  metric  values  in  Table  D.4  were  converted  to  PID  using  the  target  transform  probability  function 
(TTPF).  The  TTPF  is  given  by 


TTP  ' 


P  = 


v^oy 


1  + 


v^oy 


£  =  1.33  +  0.23^ 


(D.24) 


where  V50  is  the  value  of  the  metric  necessary  for  the  task  to  be  performed  50%  of  the  time.  For 
identification,  the  value  of  V50  has  been  determined  to  be  28. 

Using  the  values  from  table  D.4  with  the  TTPF,  the  PID  values  were  predicted.  These  values  are  compared 
with  the  measured  PID  values  in  Figure  D.4 

The  TTPF  explains  approximately  94%  of  the  variance  in  the  measured  data. 
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Appendix  E.  Model  Details 
1.  EYE  MTF 

The  eye  MTF  is  taken  from  Overington  (1976).  The  eye  MTF  includes  factors  for 
refraction  optics,  retina,  and  tremor;  each  of  these  is  calculated  with  a  numerical  fit  to  the 
data  in  Overington.  Eye  MTF  varies  with  light  level  because  it  depends  on  pupil 
diameter.  See  Table  E-1  for  pupil  diameter  versus  display  light  level.  These  pupil 
diameters  are  for  one  eye;  the  pupil  diameter  for  two  eyes  is  about  0.5  mm  smaller. 


Table  E-1.  Pupil  Diameter  Versus  Light  Level 


Pupil  Diameter  (mm) 

7.0 

6.2 

5.6 

4.9 

4.2 

3.6 

3.0 

2.5 

Light  Level  (LOG  fL) 

-4 

-3 

-2 

-1 

0 

+1 

+2 

+3 

fr  =  frequency  at  the  eye  in  cycles  per  milliradian 

bb  =  EOG10(display  luminance) 

dpul  = -9.011  +  13.23  *EXP(-bb/ 21.082) 

dpul  =  dpul  -  (eye#  -  1)  *  .5 

eO  =  (.7155  +  .277  /  dpul  ^  .5)  ^  2 

fi  =  EXP(3.663  -  .04974  *  dpul  ^  2  *  EOGlO(dpul)) 

fe  =  43.69  *  fr 

eye  MTE  =  EXP(-(fe  /  fi)  ^  eO)  *  EXP(-.375  *  (fr)  ^  1.21)*  EXP(-.4441  *  fr  *  fr) 


2.  CONTRAST  THRESHOLD  FUNCTION 

The  numerical  approximation  provided  by  Barten  (1992,  2000)  is  used  to  predict  sine- 
wave  CTE. 

CTEffi)  =  (a  u  EXP(-h  u)  (1  +  c  EXP(h  u)) 
num  =  540(1  +  0.7A.)-0-2 
denom  =  1  +  72/(w  (1  +  u/S)^) 
a  =  num/denom 
b  =  OJ  (1  +  100A.)0-15 
c  =  0.06 


where 


u  =  Spatial  frequency  in  cycles  per  degree, 
w  =  Square  root  of  picture  area  in  degrees,  and 
L  =  Display  luminance  in  cd/m^. 
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The  CTF  values  for  a  monoeular  display  are  inereased  by  the  a/2  [see  Volume  I,  Seetion 
1.802  in  Boff  and  Lineoln  (1988)]. 

3.  CHARGE-COUPLED  DEVICE  MTF 

The  following  parameters  are  used  in  the  equations: 

Dpit  =  Pitch  of  detector  in  milliradians, 

Dfii  =  Fill  factor, 

Vint  =  Spacing  of  pseudo  interlace  in  milliradians  , 

Cte  =  Charge  transfer  efficiency, 

Nh=  Number  of  charge  transfers, 

Vcut  =  Electronic  cutoff  as  a  fraction  of  \I{2-Vlpitch), 

Ep  =  Number  of  poles  in  electronic  filter,  and 
Freq  =  Spatial  frequency  in  cycles/mrad. 

Dimensions  in  milliradians  are  found  by  taking  a  thousand  times  the  dimension  in 
centimeters,  multiplying  by  the  fiber-optic  taper  reduction  ratio,  and  then  dividing  by  the 
objective  focal  length. 

The  MTF  associated  with  spatial  integration  by  the  detector  is 

MTFdetector  ~  sin(n  Dpit  Dfil  Freq)  /  Dpit  Dfii  Freq). 

This  applies  both  horizontally  and  vertically.  The  horizontal  MTF  associated  with  the 
electronic  sample  and  hold  is 

MTFsaMP/HOLD  =  ^mf^rDpit  Freq)  /  f^Dpit  Freq). 

In  addition  to  the  electronic  sample  and  hold,  a  CCD  camera  normally  has  an  electronic 
filter.  The  filter  roll-off  is  applied  to  the  horizontal  and  is  modeled  as: 

MTFfiiter  =  1/(1  +  (Freq/(Eeut/(^2 

A  CCD  normally  has  both  horizontal  and  vertical  transfer  registers.  The  MTF  loss  due  to 
a  charge  transfer  efficiency  less  than  1  is  modeled  as  follows. 

MTFcte  =  Nh  -  Cte)  (1  -  cos(2  Freq)). 

Pseudo  interlace  involves  adding  adjacent  vertical  detectors  starting  with  detector  row  1 
on  the  first  field  and  with  detector  row  2  for  the  second  field.  During  video  field  1,  the 
first  video  line  consists  of  the  signals  from  detector  row  1  and  detector  row  2  added 
together.  The  second  video  line  in  field  1  consists  of  detector  row  3  and  detector  row  4 
added  together,  etc.  During  video  field  2,  the  signal  on  the  first  video  line  consists  of 
detector  row  2  and  detector  row  3  added  together.  The  vertical  MTF  associated  with 
pseudo  interlace  is 

MTFinterlace  ~  COs(n  Vint  Freq). 
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4.  MTF  OF  FIBER-OPTIC  TAPER 

According  to  Schott  (Siegmund,  1989),  the  limiting  resolution  of  a  fiber-optie  taper  is 
approximately  600  divided  by  the  fiber  pitch  in  |um.  Assume  that  limiting  resolution 
oecurs  at  3-peroent  contrast,  and  that  the  MTF  is  Gaussian.  If  taper  input  end  pitch  is  in 
|um  and  objective  foeal  length  is  in  eentimeters,  then: 

MTFj-educer  ~  EXP  Coef  Freq^j 

Coef  =  10,000  Pitch^  LOG^  (.03)  /  (600  Fq)^- 

5.  OUTPUT  BRIGHTNESS  FOR  l2  CCD  AND  CRT  DISPLAY 

The  output  brightness  for  an  I^  CCD  and  display  is 

Bdisp  ~  Dgain  (Bout  Cyolt  GcCD  +  Lmin 

where: 

Bout  is  defined  above  exeept  EPtr  is  redueer  transmission, 

Cvolt  =  volt  out  of  CCD  per  footeandle  input, 

Gccd  =  Gain  of  CCD  AGC, 

Dgain  =  fL  out  of  display  per  input  (voltjS^™^, 
gama  =  gamma  (intensity  power  law  exponent),  and 
E=  {LminlDgain)^^^^^^- 

6.  DIFFRACTION-LIMITED  OPTICS  MTF 

The  diffraction-limited  optics  MTF  is  given  by: 

MTFdif  =  (2/7r)  (cos-1  (q)  _(q)  (j  -q2)1/2)  or 

MTFdif  =  r2/;r;  (tan-1  (Gi)-(Q)  (1  -Q2)1/2) 


where 

Q  =  X  •  Efoq/D, 

X  =  Wavelength  in  |um, 

Ereq  =  Spatial  frequeney  in  milliradians, 

D  =  Eq/E#  (Opties  aperature  diameter  in  millimeters),  and 

g  =  (l_g2)l/2/g. 
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